Background

ELITR goals and plans build on following papers and activities of team members:

[1] CUNI Submission in WMT17: Chimera Goes Neural, Ondřej Bojar, Tom Kocmi, David Mareček, Roman Sudarikov and Dušan Variš. In Proceedings of the EMNLP 2017 Second Conference on Machine Translation, Copenhagen, Denmark, pp. 248–256, 2017.[2] Neural Monkey: An Open-source Tool for Sequence Learning, Jindřich Helcl and Jindřich Libovický. Software: https://github.com/ufal/neuralmonkey, http://neural-monkey.readthedocs.io/.

[3] Neural Monkey: The Current State and Beyond, Jindřich Helcl, Jindřich Libovický, Tom Kocmi, Tomáš Musil, Ondřej Cífka, Dušan Variš and Ondřej Bojar. In Proceedings of the 13th Conference of the Association for Machine Translation in the Americas, Vol. 1: MT Researchers’ Track, Stroudsburg, PA, pp. 168–176, 2018.

[4] Results of the WMT17 Neural MT Training Task, Ondřej Bojar, Jindřich Helcl, Tom Kocmi, Jindřich Libovický and Tomáš Musil. In proceedings of the Second Conference on Machine Translation, Volume 2: Shared Task Papers, Stroudsburg, PA, pp. 525–533, 2017.

[5] Curriculum Learning and Minibatch Bucketing in Neural Machine Translation, Tom Kocmi, Ondřej Bojar. In Proceedings of the International Conference Recent Advances in Natural Language Processing, pp. 379-386, INCOMA Ltd., Šumen, Bulgaria, ISBN 978-954-452-048-9, 2017.

[6] R Sennrich, B Haddow and A Birch (2016). “Improving Neural Machine Translation Models with Monolingual Data”, ACL.

[7] R Sennrich, B Haddow and A Birch (2016). “Neural Machine Translation of RareWords with Subword Units”, ACL.

[8] R Bawden, R Sennrich, A Birch and B Haddow (2018). “Evaluating Discourse Phenomena in Neural Machine Translation”, NAACL.

[9] R Sennrich, O Firat, K Cho, A Birch, B Haddow, J Hitschler, M Junczys-Dowmunt, S Läubli, A V Miceli Barone, J Mokry, M Nadejde (2017), “Nematus: a Toolkit for Neural Machine Translation”, EACL.

[10] A V Miceli Barone, B Haddow, U Germann, R Sennrich (2017). “Regularization techniques for fine-tuning in neural machine translation”, EMNLP.

[11] Seligman, Mark, Alex Waibel, Andrew Joscelyne: TAUS Speech-to-Speech Translation Technology Report, TAUS Report, 2017

[12] Sperber, Matthias, Graham Neubig, Jan Niehues, Alex Waibel: Neural Lattice-to-Sequence Models for Uncertain Inputs, In Proceedings of EMNLP, Copenhagen, Denmark, September 7-11, 2017

[13] Zenkel, Thomas, Ramon Sanabria, Florian Metze, Jan Niehues, Matthias Sperber, Sebastian Stüker, Alex Waibel: Comparison of Decoding Strategies for CTC Acoustic Models. In Proceedings of Interspeech 2017, Stockholm, Sweden, August 20-24, 2017

[14] Cho, Eunah, Niehues, Jan, Waibel, Alex: NMT-based Segmentation and Punctuation Insertion for Real-time Spoken Language Translation. In Proceedings of the 18th Annual Conference of the International Speech Communication Association (Interspeech 2017), Stokholm, Sweden.

[15] Ha, Than-Le, Jan Niehues, Alex Waibel: Toward Multilingual Neural Machine Translation with Universal Encoder and Decoder. In Proceedings of the IWSLT Seattle, WA, USA, December 8-9, 2016

Relevant other Projects, activities and products

[1] HimL: Health in my Language (H2020, 2015–2018). The project translated public health information, with a focus on domain adaptation, accuracy, and morphologically rich languages. Ondrej Bojar responsible for the department’s contribution.

[2] QT21: Quality Translation 21 (H2020 RIA, 2015–2018): The aim of the project was to develop improved statistical, neural and machine-learning based translation models for challenging languages and resource scenarios and improved evaluation and continuous learning from mistakes, guided by a systematic analysis of
quality barriers. Ondřej Bojar responsible for the department’s contribution.

[3] CRACKER: Cracking the Language Barrier (H2020, 2015–2018).

[4] MosesCORE: promoting open-source Machine Translation (FP7, 2012–2015). Ondřej Bojar site PI.

[5] EuroMatrixPlus: Bringing Machine Translation for European Languages to the User (FP7, 2009–2015). Ondřej Bojar site PI since 2011.

[6] SUMMA: Scalable Understanding of Multilingual Media. H2020, 2016–2019.

[7] MMT: Modern Machine Translation. H2020, 2015–2018.

[8] TraMOOC: Translation for Massive Open Online Courses. H2020, 2015–2018.

[9] EU-BRIDGE: Bridges Across the Language Divide (eu-bridge.eu). FP7, 2012–2015.

[10] SCRIPTS: System for cross-language information processing, translation, and summarization. IARPA MATERIAL, 2017–2022. Low-resource machine translation and speech recognition for information retrieval.

[11] H2020 SecondHands (partner) Design and construction of a robot that can offer help to a maintenance technician in a pro-active manner. The robot acts as a second pair of hands that can assist the technician when he/she is in need of help. https://secondhands.eu

[12] H2020 QT21 (partner) Continuous learning from mistakes, guided by a systematic analysis of quality barriers, informed by human translators. http://www.qt21.eu

[13] German Research Foundation (DFG) Machine Translation for Education (participants) Research in speech translation, specifically when translating from German into other languages and strategies for user corrections by student bodies and how to learn from them.

[14] FP7 EUBRIDGE (coordinator) Automatic transcription and translation services for broadcasting companies, the EU parliament, university lectures, and webinars. http://project.eu-bridge.eu

[15] FP7 MetaNet (member of technology council) A Network of Excellence forging the Multilingual Europe Technology Alliance. http://www.meta-net.eu

[16] EU-Bridge: Bridges Across the Language Divide (eu-bridge.eu). FP7, 2012–2015.

[17] STON—Speech and Language Subtitling Technology in Dutch. VRT, 2015-2016.

[18] CLAST—Cross Language Automatic Subtitiling Technology. Autonomous Province of Trento, 2015-2017.