A review of the large-scale European language model 'EuroLLM' that supports 24 official EU languages and 11 other languages including Japanese, which is currently available as open source



A consortium of European research institutes and companies has developed a large-scale language model (LLM) called 'EuroLLM' that supports all 24 official EU languages as well as 11 others, with the aim of countering the current English-centric AI development and realizing AI that reflects Europe's linguistic diversity. The model is now available as open source.

eurollm.io
https://eurollm.io/



◆24 official EU languages supported by EuroLLM
・Bulgarian
・Croatian
Czech
・Danish
Dutch
·English
・Estonian
Finnish
·French
German
Greek
Hungarian
Irish
Italian
・Latvian
・Lithuanian
・Maltese
Polish
Portuguese
Romanian
・Slovak
・Slovenian
Spanish
Swedish

◆11 other languages supported by EuroLLM
Arabic
・Catalan
・Chinese
・Galician
Hindi
·Japanese
·Korean
Norwegian
Russian
Turkish
・Ukrainian

A EuroHPC Success Story: Speaking Freely with EuroLLM - EuroHPC JU

https://www.eurohpc-ju.europa.eu/eurohpc-success-story-speaking-freely-eurollm_en

The project involves prominent research institutions and companies from across Europe, including Unbabel , the Technical University of Lisbon , the University of Edinburgh and the University of Paris-Saclay .

The development team stated, 'Most LLMs are English-centric and tend to reflect English-speaking culture. We aimed to create a model that would perform fairly across all European languages.'

A major challenge in development was that Greek texts sometimes consume five to six times more tokens than English texts, leading to unequal usage costs. This problem was solved by limiting the proportion of English in the training data to 50% and allocating sufficient data to other languages. The training system was powered by the EuroHPC supercomputer ' MareNostrum 5. '

While the Hacker News community has seen positive reactions, such as 'It's an important research model funded by European taxpayers' and 'It's a great initiative to protect linguistic diversity,' some have pointed out practical issues, such as 'How competitive is it in terms of performance compared to existing commercial models?' According to LifeArchitect.ai , the MMLU value, a benchmark index used to evaluate the performance of language models for tasks in various fields, was 52.5, about 10 points lower than other models in the same class.



EuroLLM currently has models with 1.7B and 9B parameters available, with plans to release a more powerful 22B model and a multimodal model that handles images and audio in the future.

EuroLLM-9B Hugging Face

https://huggingface.co/utter-project/EuroLLM-9B

Let's try out EuroLLM. Download the model using LM Studio on Windows.



We asked a simple prompt in English: 'How do you say good morning in 24 languages?' and got answers translated into each language.



We also asked about 11 other languages and received answers.



Next, we asked the same question about sushi in Japanese. Although some of the answers were insufficient, including mistakes such as 'Japanese: sashi,' we learned that sushi is recognized in the EU as well.



The development team stated, 'Our goal is to accelerate innovation in Europe. We want to give everyone the opportunity to use this European-made LLM and build new things on top of it. In the future, we plan to develop a more comprehensive model that supports not only text, but also images and audio.'

in AI,   Review, Posted by darkhorse_logmk