COMPARTIR RECURSOS LINGÜÍSTICS DE QUALITAT EN L'ÀMBIT JURÍDIC PER DESENVOLUPAR LA TRADUCCIÓ AUTOMÀTICA NEURONAL PER A LES LLENGÜES EUROPEES AMB POCS RECURSOS

Translated title of the contribution: SHARING HIGH-QUALITY LANGUAGE RESOURCES IN THE LEGAL DOMAIN TO DEVELOP NEURAL MACHINE TRANSLATION FOR UNDER-RESOURCED EUROPEAN LANGUAGES

Petra Bago, Sheila Castilho, Edoardo Celeste, Jane Dunne, Federico Gaspari, Niels Runar Gislason, Andre Kasen, Filip Klubička, Gauti Kristmannsson, Helen McHugh, Roisin Moran, Orla Ni Loinsigh, Jon Arild Olsen, Carla Parra Escartin, Akshai Ramesh, Natalia Resende, Paraic Sheridan, Andy Way

Research output: Contribution to journalArticlepeer-review

Abstract

This article reports some of the main achievements of the European Union-funded PRINCIPLE project in collecting high-quality language resources (LRs) in the legal domain for four under-resourced European languages: Croatian, Irish, Norwegian, and Icelandic. After illustrating the significance of this work for developing translation technologies in the context of the European Union and the European Economic Area, the article outlines the main steps of data collection, curation, and sharing of the LRs gathered with the support of public and private data contributors. This is followed by a description of the development pipeline and key features of the state-of-the-art, bespoke neural machine translation (MT) engines for the legal domain that were built using this data. The MT systems were evaluated with a combination of automatic and human methods to validate the quality of the LRs collected in the project, and the high-quality LRs were subsequently shared with the wider community via the ELRC-SHARE repository. The main challenges encountered in this work are discussed, emphasising the importance and the key benefits of sharing high-quality digital LRs.

Translated title of the contributionSHARING HIGH-QUALITY LANGUAGE RESOURCES IN THE LEGAL DOMAIN TO DEVELOP NEURAL MACHINE TRANSLATION FOR UNDER-RESOURCED EUROPEAN LANGUAGES
Original languageCatalan
Pages (from-to)9-34
Number of pages26
JournalRevista de Llengua i Dret
Issue number78
DOIs
Publication statusPublished - Dec 2022

Bibliographical note

Publisher Copyright: © 2022 Authors. All rights reserved.

Other keywords

  • evaluation
  • language resources
  • legal translation
  • neural machine translation
  • under-resourced languages

Fingerprint

Dive into the research topics of 'SHARING HIGH-QUALITY LANGUAGE RESOURCES IN THE LEGAL DOMAIN TO DEVELOP NEURAL MACHINE TRANSLATION FOR UNDER-RESOURCED EUROPEAN LANGUAGES'. Together they form a unique fingerprint.

Cite this