TY - GEN
T1 - Bootstrapping Icelandic Knowledge Graph Data
AU - Einarsson, Hafsteinn
AU - Friðriksdóttir, Steinunn Rut
AU - Einarsson, Hafsteinn
PY - 2022
Y1 - 2022
N2 - Proceedings of the ESSLLI 2022 Student Session Bootstrapping Icelandic Knowledge Graph Data Steinunn Rut Friðriksdóttir & Hafsteinn Einarsson A knowledge graph is a semantic network of named entities, e.g. people, objects and organizations, that can be used to uniquely identify mentions in text. In order to create such a graph, it is crucial to possess plenty of specifically annotated data that includes not only the entities themselves but the relations that hold between them. Traditionally, such data has only been available for high-resource languages. In this paper, we present our approach to bootstrap training data using machine translation and open relation extraction methods. We hypothesize that by automatically translating our data to English, we can perform relation extraction using SOTA language models before translating the entities back to the source language, significantly reducing startup costs when developing such models for a given language. Our results show that this approach has promise for lower-resource languages such as Icelandic. However, it is currently limited due to the quality of translation and open relation extraction models.
AB - Proceedings of the ESSLLI 2022 Student Session Bootstrapping Icelandic Knowledge Graph Data Steinunn Rut Friðriksdóttir & Hafsteinn Einarsson A knowledge graph is a semantic network of named entities, e.g. people, objects and organizations, that can be used to uniquely identify mentions in text. In order to create such a graph, it is crucial to possess plenty of specifically annotated data that includes not only the entities themselves but the relations that hold between them. Traditionally, such data has only been available for high-resource languages. In this paper, we present our approach to bootstrap training data using machine translation and open relation extraction methods. We hypothesize that by automatically translating our data to English, we can perform relation extraction using SOTA language models before translating the entities back to the source language, significantly reducing startup costs when developing such models for a given language. Our results show that this approach has promise for lower-resource languages such as Icelandic. However, it is currently limited due to the quality of translation and open relation extraction models.
UR - https://uvaauas.figshare.com/articles/conference_contribution/Fri_riksd_ttir_Einarsson_2022_Bootstrapping_Icelandic_Knowledge_Graph_Data/20367948
U2 - 10.21942/UVA.20367948
DO - 10.21942/UVA.20367948
M3 - Conference contribution
BT - ESSLLI
ER -