TY - JOUR
T1 - COMPARTIR RECURSOS LINGÜÍSTICS DE QUALITAT EN L'ÀMBIT JURÍDIC PER DESENVOLUPAR LA TRADUCCIÓ AUTOMÀTICA NEURONAL PER A LES LLENGÜES EUROPEES AMB POCS RECURSOS
AU - Bago, Petra
AU - Castilho, Sheila
AU - Celeste, Edoardo
AU - Dunne, Jane
AU - Gaspari, Federico
AU - Gislason, Niels Runar
AU - Kasen, Andre
AU - Klubička, Filip
AU - Kristmannsson, Gauti
AU - McHugh, Helen
AU - Moran, Roisin
AU - Loinsigh, Orla Ni
AU - Olsen, Jon Arild
AU - Escartin, Carla Parra
AU - Ramesh, Akshai
AU - Resende, Natalia
AU - Sheridan, Paraic
AU - Way, Andy
N1 - Funding Information: The PRINCIPLE project was co-financed by the European Union Connecting Europe Facility under Action 2018-EU-IA-0050 with grant agreement INEA/CEF/ICT/A2018/1761837. The authors affiliated with the ADAPT SFI Research Centre also acknowledge the financial support of Science Foundation Ireland through the SFI Research Centres Programme under Grant Agreement No. 13/RC/2106-P2. The authors are grateful to the anonymous reviewers for their insightful comments and valuable suggestions on a previous version of this paper, and to the guest editors of the special issue and the journal editors for helpful advice in the preparation of the final version of the paper. Any errors remain the sole responsibility of the authors. Publisher Copyright: © 2022 Authors. All rights reserved.
PY - 2022/12
Y1 - 2022/12
N2 - This article reports some of the main achievements of the European Union-funded PRINCIPLE project in collecting high-quality language resources (LRs) in the legal domain for four under-resourced European languages: Croatian, Irish, Norwegian, and Icelandic. After illustrating the significance of this work for developing translation technologies in the context of the European Union and the European Economic Area, the article outlines the main steps of data collection, curation, and sharing of the LRs gathered with the support of public and private data contributors. This is followed by a description of the development pipeline and key features of the state-of-the-art, bespoke neural machine translation (MT) engines for the legal domain that were built using this data. The MT systems were evaluated with a combination of automatic and human methods to validate the quality of the LRs collected in the project, and the high-quality LRs were subsequently shared with the wider community via the ELRC-SHARE repository. The main challenges encountered in this work are discussed, emphasising the importance and the key benefits of sharing high-quality digital LRs.
AB - This article reports some of the main achievements of the European Union-funded PRINCIPLE project in collecting high-quality language resources (LRs) in the legal domain for four under-resourced European languages: Croatian, Irish, Norwegian, and Icelandic. After illustrating the significance of this work for developing translation technologies in the context of the European Union and the European Economic Area, the article outlines the main steps of data collection, curation, and sharing of the LRs gathered with the support of public and private data contributors. This is followed by a description of the development pipeline and key features of the state-of-the-art, bespoke neural machine translation (MT) engines for the legal domain that were built using this data. The MT systems were evaluated with a combination of automatic and human methods to validate the quality of the LRs collected in the project, and the high-quality LRs were subsequently shared with the wider community via the ELRC-SHARE repository. The main challenges encountered in this work are discussed, emphasising the importance and the key benefits of sharing high-quality digital LRs.
KW - evaluation
KW - language resources
KW - legal translation
KW - neural machine translation
KW - under-resourced languages
UR - https://www.scopus.com/pages/publications/85150417040
U2 - 10.2436/rld.i78.2022.3741
DO - 10.2436/rld.i78.2022.3741
M3 - Grein
SN - 0212-5056
SP - 9
EP - 34
JO - Revista de Llengua i Dret
JF - Revista de Llengua i Dret
IS - 78
ER -