ELITR: Complementing Interpreters and Starting to Take Notes for You

The EU project ELITR has successfully ended and passed its review. What are the tangible outcomes?

We addressed two major fields:

  • Simultaneous speech translation in a conference setting.
  • Automatic note-taking (“minuting”) of small group meetings.

Simultaneous speech translation in a conference setting

The first area is briefly illustrated in our concluding video:

Building upon the work of simultaneous interpreters, our solution can substantially extend the set of languages offered by conference organizers. We ran many test sessions and demonstrations of the system, with several notable ones:

  1. We covered the discussion that followed the premiere of theAItre play in February 2021. The discussion had a Czech-to-English interpreter behind the scenes and we were benefiting from her output, translating it live into 42 other languages. The human element in the process was critical: the interpreter easily handled the added burden of face masks worn by discussion participants, and she very nicely “segmented” the spontaneous speech into individual sentences; a task that is difficult but very important for automatic translation because translation systems are generally trained to process well formed sentences.
  2. We tested our system at the EUROSAI 2021 Congress that started the EUROSAI presidency of the Supreme Audit Office of the Czech Republic. The system was following 5 simultaneous versions of the speeches (the original speaker, typically speaking English, and the interpretation in German, French, Spanish and Russian) and we were live switching among the sources depending on the immediate quality of the input sound (the congress was run in hybrid form) and quality of the automatic recognition (different accents and speech characteristics affected our systems differently). The report of this exercise was published at ASLTRW 2021: Operating a Complex SLT System with Speakers and Human Interpreters.
  3. At META-FORUM 2022 in June, we tested our system with novel under-cover features: (1) the speech recognition allowed to enter words at runtime that were missing or mis-recognized, (2) we had a duct-tape prototype of live manual correction of recognition errors at the level of short phrases. Live post-editing of speech recognition output is not an easy discipline. It needed a 100% focus of our operator and quick reactions. Somewhat unexpected, there was a rather frequent need of not just adding correction rules (to fix mis-recognized “Russian program” into the intended “research program” or to indicate that “Joachim” in this meeting should actually be spelled “Joakim”) but also of retracting them (as soon as a different and differently-spelled “Joachim” entered the scene, or when the discussion moved again to Georgia instead of talking about Georg’s achievements). Some mis-recognitions were critical, such as “extinction” instead of “extension”, some were amusing (“I have got GPT-3 questions.” instead of “thirty questions”). Some of the speakers’ accents would require even speaker-specific corrections (the filler phrase “and the like” was pronounced as “and the lie”, which sadly, our ASR was picking up “exactement”).

We highlight some of our research results in speech translation:

The topic of multi-lingual multi-source simultaneous speech translation and the current state of the art in comparing human simultaneous interpretation with the systems was covered in Ondřej Bojar’s invited talk at WMT 2022 in Abu Dhabi.

Automatic Minuting (Note-Taking)

Text summarization is an old topic in natural language processing. ELITR has defined a related but different task of automatic creation of minutes, or “minuting”.

As a high-risk research, we focused on the goal of converting a transcript of multi-party meetings (think project or small group meetings) into a bulleted list of points. This challenge is not something a three-year project could resolve, but we laid solid foundations in the area.

We prepared and released ELITR Minuting Corpus on which systems for minuting can be trained and tested. Collecting recordings of genuine project meetings is difficult for many reasons, including privacy of meeting participants. In ELITR, we defined and applied so-called “double consent” strategy: we first get a preliminary approval from everyone to record and process the recording in a confidential way (correcting the automatic transcript, carefully de-identifying the text, and then creating meeting minutes), and then we share a preview of the data with the participants again, to get the final consent. At times, further removal of sensitive parts (beyond the requirements of GDPR) was needed. With this second consent, the risk of harming anyone’s privacy is substantially decreased.

With the ELITR Minuting Corpus data, we organized AutoMin 2021, the first shared task of this kind. In total 10 teams from academia as well as industry from all around the world, including Japan, India, Germany, Switzerland, Russia, and the UK, attempted to automatically provide the best minutes given de-identified meeting transcripts.

The resulting automatic minutes were evaluated automatically with variants of the well-known ROUGE score and manually by our annotators in terms of Adequacy, Fluency and Grammatical Correctness. Full details of the evaluation are in our Overview of the First Shared Task on Automatic Minuting (AutoMin) at Interspeech 2021 and papers describing the individual participating systems are in AutoMin proceedings.

Building upon the previous success, AutoMin 2023 is in preparation. A very important goal we set forth for the 2023 edition is to improve manual and automatic measures of minutes quality. While large pre-trained models deliver outstanding outputs on average, they are very prone to hallucinating content not discussed. Also, the agreement on what is important in a meeting and what is the ideal way of personalizing meeting minutes are active research areas.

Here are our key scientific results in minuting:

Looking forward to seeing you at AutoMin 2023!