Scope (Languages)

In addition to addressing technological challenges in ASR, SLT, MT, and automatic minuting, ELITR will cover a large number of EU and EUROSAI languages (see the list at the bottom of this page) to reduce the language barrier in spoken and written communication. Our ASR technology will cover at least 6 EU languages, our MT least all 24 official EU languages, and our prototype of automatic minuting will aim at least 2. Further EUROSAI languages will be added on an experimental basis based on research and business needs of the partners. Our second use case (conference calls interpretation in alfaview platform) may also reveal unexpected e.g. Asian languages (Chinese, Japanese or Indian languages). Due to the high number of languages covered, especially in MT, where we will cover between 24 × 23 = 552 and 43 × 42 = 1806 translation directions, we will focus on a subset of languages as primary languages for research and evaluation.

The intended language coverage of the ELITR project balances the needs with the realistically available capacity and availability of training data. See the summary in the following table.

TechnologyPrimary FocusCoveredExperimental
ASREnglish, GermanFrench, Spanish, Italian, RussianCzech
SLT & MTEnglish, German –> English, German, Czechall EU languages –> all EU languagesall EUROSAI languages –> all EUROSAI languages
SummarizationEnglish, Czech

Table 1: Language and language pair level of support by the ELITR project.

Languages and language pairs with Primary Focus will receive full attention, dedicated modelling techniques (not necessarily separate models due to the multilingual approach, where we aim to save processing time and/or dependence on large quantities of training data by handling several languages at once and improvements in terms of document level translation quality. Languages Covered will be fully integrated and provided by ELITR to the user partners AV and SAO as needed for their usecases. For experimentally-supported languages, specific language pairs will be selected based on research or business reasons as the time progresses. Likely candidates include Arabic (a United Nations language, 310M speakers), or Ukrainian (45M speakers, to explore the relatedness with Russian).

Compared to existing solutions, e.g. CEF eTranslation service our objective is to substantially advance the quality and efficiency of MT, especially in the case of spoken language translation, document-level translation, and for low-resource language pairs, in line with the use cases of this project.

The 24 official EU languages are:

Bulgarian, Croatian, Czech, Danish, Dutch, English, Estonian, Finnish, French, German, Greek, Hungarian, Irish, Italian, Latvian, Lithuanian, Maltese, Polish, Portuguese, Romanian, Slovak, Slovene, Spanish, and Swedish.

There are 43 EUROSAI languages which include all the EU languages and the following ones:

Albanian, Arabic, Armenian, Azerbaijani, Belorussian, Bosnian, Georgian, Hebrew, Icelandic, Kazakh, Luxembourgish, Macedonian, Moldovan, Montenegrin, Norwegian, Russian, Serbian, Turkish, and Ukrainian.

We note that some EUROSAI languages (Moldovan, Montenegrin) have not even the most basic support solutions available worldwide, in part due to the fact that their official status is not universally recognized.