The paper “BPEmb: Tokenization-free Pre-trained Subword Embeddings in 275 Languages” by Benjamin Heinzerling and Michael Strube has been accepted for LREC 2018, which will take place in Miyazaki, Japan, from May 7th to 12th, 2018. The paper comes with an innovative natural language resource, which can be downloaded from this GitHub page. BTW: The European Media Lab has been a regular sponsor of the LREC conference for several consecutive years.
The Heidelberg Institute for Theoretical Studies (HITS) was established in 2010 by the physicist and SAP co-founder Klaus Tschira (1940-2015) and the Klaus Tschira Foundation as a private, non-profit research institute. HITS conducts basic research in the natural sciences, mathematics and computer science, with a focus on the processing, structuring, and analyzing of large amounts of complex data and the development of computational methods and software. The research fields range from molecular biology to astrophysics. The shareholders of HITS are the HITS-Stiftung, which is a subsidiary of the Klaus Tschira Foundation, Heidelberg University and the Karlsruhe Institute of Technology (KIT). HITS also cooperates with other universities and research institutes and with industrial partners. The base funding of HITS is provided by the HITS Stiftung with funds received from the Klaus Tschira Foundation. The primary external funding agencies are the Federal Ministry of Education and Research (BMBF), the German Research Foundation (DFG), and the European Union.
This is only available in English