Meta releases 'OMol25,' a large-scale quantum chemistry dataset containing over 100 million molecular structure records, and the AI model 'Universal Model for Atoms (UMA)'

Meta's FAIR (Fundamental AI Research) team and a joint research team from multiple research institutions have announced an extremely large-scale, highly accurate quantum chemistry dataset, ' Open Molecules 2025 (OMol25) ,' and an AI model trained based on it, ' Universal Model for Atoms (UMA) .'
[2505.08762] The Open Molecules 2025 (OMol25) Dataset, Evaluations, and Models
Sharing new breakthroughs and artifacts supporting molecular property prediction, language processing, and neuroscience
https://ai.meta.com/blog/meta-fair-science-new-open-source-releases/
Computational Chemistry Unlocked: A Record-Breaking Dataset to Train AI Models has Launched - Berkeley Lab – Berkeley Lab News Center
https://newscenter.lbl.gov/2025/05/14/computational-chemistry-unlocked-a-record-breaking-dataset-to-train-ai-models-has-launched/
UMA: A Family of Universal Models for Atoms | Research - AI at Meta
https://ai.meta.com/research/publications/uma-a-family-of-universal-models-for-atoms/
OMol25 is a dataset that contains calculation results for over 100 million single-molecule structures using density functional theory (DFT) with the aim of contributing to the development and evaluation of machine learning models in molecular chemistry. It contains a wide variety of molecular systems, including approximately 83 million unique molecular structures and information on 83 elements, including molecular samples with up to 350 atoms, and the calculations are performed using the highly accurate method 'ωB97M-V/def2-TZVPD'.

In addition, OMol25 includes a variety of physical quantities such as energy, forces, spin, charge, orbital energies,
The data are categorized into several areas, including biomolecules, metal complexes, electrolytes, and recalculation of existing community datasets, and appropriate structure generation and molecular dynamics methods are applied to each area. In addition, practical benchmark tasks in molecular modeling, such as ligand binding energy and structure reoptimization, are defined and models are evaluated based on these.

The construction of OMol25 took approximately 6 billion
facebook/OMol25 · Hugging Face
https://huggingface.co/facebook/OMol25
And UMA (Universal Models for Atoms) is a large-scale, general-purpose interatomic potential model group built by using all datasets released by Meta FAIR over the past five years, including OMol25, as training data, and can predict atomic-level properties and behaviors with high accuracy and speed in various fields of chemistry and materials science. At the time of writing, two types, UMA-small and UMA-medium, are available depending on the scale of the model.
UMA aims to cover multiple areas of chemistry, including molecules, materials, and catalysts, and is built using 3D structures containing more than 5 billion atoms as training data. In particular, the model architecture uses a new structure called 'Mixture of Linear Experts,' which makes it possible to expand the model capacity without sacrificing computational efficiency.

For example, while the UMA medium model has a total of 1.4 billion parameters, the number of parameters used in the calculations per structure is kept to approximately 50 million. This allows UMA to achieve extremely fast inference speeds despite being a large-scale model.
UMA has been confirmed to be capable of handling a wide range of chemical tasks with only pre-training and no fine-tuning, and has been reported to perform equal to or better than existing specialized models. This enables rapid and accurate calculations in a wide range of applications, including molecular property prediction, material design, catalyst development, energy storage, and semiconductor manufacturing.

UMA's code, trained models, and related data are all publicly available, allowing researchers and engineers to freely incorporate it into their own computational workflows. It is positioned as a fundamental model that will contribute to the acceleration of future atomic-level modeling.
facebook/UMA · Hugging Face
https://huggingface.co/facebook/UMA
Related Posts: