A review of the large-scale European language model 'EuroLLM' that supports 24 official EU languages and 11 other languages including Japanese, which is currently available as open source

 
eurollm.io
https://eurollm.io/

 ◆24 official EU languages supported by EuroLLM
・Bulgarian
・Croatian
 Czech
・Danish
 Dutch
 ·English
・Estonian
 Finnish
 ·French
 German
 Greek
 Hungarian
 Irish
 Italian
・Latvian
・Lithuanian
・Maltese
 Polish
 Portuguese
 Romanian
・Slovak
・Slovenian
 Spanish
 Swedish
 ◆11 other languages supported by EuroLLM
 Arabic
・Catalan
・Chinese
・Galician
 Hindi
 ·Japanese
 ·Korean
 Norwegian
 Russian
 Turkish
・Ukrainian
 A EuroHPC Success Story: Speaking Freely with EuroLLM - EuroHPC JU
 
The project involves prominent research institutions and companies from across Europe, including Unbabel , the Technical University of Lisbon , the University of Edinburgh and the University of Paris-Saclay .
The development team stated, 'Most LLMs are English-centric and tend to reflect English-speaking culture. We aimed to create a model that would perform fairly across all European languages.'
A major challenge in development was that Greek texts sometimes consume five to six times more tokens than English texts, leading to unequal usage costs. This problem was solved by limiting the proportion of English in the training data to 50% and allocating sufficient data to other languages. The training system was powered by the EuroHPC supercomputer ' MareNostrum 5. '
While the Hacker News community has seen positive reactions, such as 'It's an important research model funded by European taxpayers' and 'It's a great initiative to protect linguistic diversity,' some have pointed out practical issues, such as 'How competitive is it in terms of performance compared to existing commercial models?' According to LifeArchitect.ai , the MMLU value, a benchmark index used to evaluate the performance of language models for tasks in various fields, was 52.5, about 10 points lower than other models in the same class.

 EuroLLM currently has models with 1.7B and 9B parameters available, with plans to release a more powerful 22B model and a multimodal model that handles images and audio in the future.
 EuroLLM-9B Hugging Face
 
Let's try out EuroLLM. Download the model using LM Studio on Windows.

 We asked a simple prompt in English: 'How do you say good morning in 24 languages?' and got answers translated into each language.
 

 We also asked about 11 other languages and received answers.
 
 Next, we asked the same question about sushi in Japanese. Although some of the answers were insufficient, including mistakes such as 'Japanese: sashi,' we learned that sushi is recognized in the EU as well.
 

 The development team stated, 'Our goal is to accelerate innovation in Europe. We want to give everyone the opportunity to use this European-made LLM and build new things on top of it. In the future, we plan to develop a more comprehensive model that supports not only text, but also images and audio.'
Related Posts:







