Google launches DeepSomatic AI for highly accurate cancer gene mutation detection, accelerating cancer research worldwide with open source

Google Research, Google's research and development division, has announced that it has developed an open-source AI tool called DeepSomatic to identify genetic mutations in cancer in collaboration with the University of California, Santa Cruz, and others.
Using AI to identify genetic variants in tumors with DeepSomatic
https://research.google/blog/using-ai-to-identify-genetic-variants-in-tumors-with-deepsomatic/
Accurate somatic small variant discovery for multiple sequencing technologies with DeepSomatic | Nature Biotechnology
https://www.nature.com/articles/s41587-025-02839-x
Today, @GoogleResearch announced DeepSomatic, a new machine learning model developed with our partners, including @ucscgenomics and @ChildrensMercy , that accurately identifies genetic variants in cancer cells — a critical step for delivering more precise treatments for patients.…
— Google (@Google) October 16, 2025
Genetic analysis of cancer requires accurate distinction between congenital mutations that are inherited from parents and exist in all cells of the body, and acquired mutations caused by ultraviolet light, chemicals, or random errors during DNA replication.
DeepSomatic is an extension of DeepVariant , a tool for discovering congenital mutations. It converts the genome sequence data of cancer cells and normal cells into images and analyzes them using a convolutional neural network (CNN). This image analysis distinguishes between minute errors that occur during the sequencing process of reading gene sequences and genuine genetic mutations, and detects cancer-specific somatic mutations with high accuracy.

This method allows DeepSomatic to outperform existing analysis tools, and the research team reports that it has achieved significant improvements in identifying mutations known as ' indels ,' which are insertions or deletions of parts of the genetic code that have previously been difficult to detect.
DeepSomatic's high performance is supported by a high-quality training dataset called 'CASTLE.' This dataset combines data from three major sequencing systems for breast and lung cancer samples, and is said to be highly accurate by removing errors contained in each platform.
In fact, when detecting indel mutations using data from Illumina , a leading sequencing system, DeepSomatic achieved an F1 score of 90%, while existing tools achieved an accuracy of approximately 80%. Furthermore, when using PacBio data, existing tools scored below 50%, while DeepSomatic achieved a high accuracy of over 80%, demonstrating its superiority.
Furthermore, DeepSomatic has been shown to maintain high performance even under difficult analytical conditions or with data containing little information, such as old tissue samples fixed in formalin and exome sequencing data, which analyzes only the portion of the genome that serves as the blueprint for proteins.

Google Research also claims that DeepSomatic can be applied to completely different types of cancer, not just the breast and lung cancers it was trained on. In fact, when analyzing samples of glioblastoma, a highly malignant brain tumor, it was able to accurately identify the genetic mutations that cause the disease.
They also analyzed childhood leukemia, the most common cancer in children. Because leukemia is a cancer of the blood, it is difficult to collect normal blood cells for comparison. However, even in a 'tumor-only' analysis using only cancer cell data, they were able to discover 10 new mutations in addition to the already known mutations.

Google Research says, 'Google Research makes fundamental breakthroughs that have a real, tangible impact on people. We do this work because the path to the future is based on research that can make reality better for people.'
DeepSomatic is available under the BSD license and its repository is available on GitHub.
GitHub - google/deepsomatic: DeepSomatic is an analysis pipeline that uses a deep neural network to call somatic variants from tumor-normal and tumor-only sequencing data.
https://github.com/google/deepsomatic
The dataset CASTLE is also hosted on GitHub.
GitHub - CASTLE-Panel/castle: CAncer Standards Long-read Evaluation
https://github.com/CASTLE-Panel/castle
Related Posts:







