Oct 10, 2025 11:50:00

Data poisoning attacks that contaminate AI training data and cause unintended behavior can be carried out with as few as 250 malicious documents, regardless of the size of the model or the amount of data.

Research conducted by the UK

AI Security Laboratory and the Alan Turing Institute in collaboration with AI company Anthropic has revealed that data poisoning can create backdoor vulnerabilities in large language models using just 250 malicious documents, regardless of the size of the model or the amount of training data.

[2510.07192] Poisoning Attacks on LLMs Require a Near-constant Number of Poison Samples
https://arxiv.org/abs/2510.07192

Examining backdoor data poisoning at scale | AISI Work
https://www.aisi.gov.uk/blog/examining-backdoor-data-poisoning-at-scale

LLMs may be more vulnerable to data poisoning than we thought | The Alan Turing Institute

https://www.turing.ac.uk/blog/llms-may-be-more-vulnerable-data-poisoning-we-thought

A small number of samples can poison LLMs of any size \ Anthropic
https://www.anthropic.com/research/small-samples-poison

Data poisoning is a type of cyberattack that can cause AI models to behave dangerously by manipulating or tampering with the data used to train them. This allows attackers to 'trigger' AI models to output sensitive data, degrade system performance, generate biased information, circumvent security protocols, or otherwise force the models to perform requests that they would otherwise refuse.

The training data also includes publicly available text, meaning anyone can create data that negatively impacts AI models by posting targeted text on blogs or websites, for example.

Previously, researchers had assumed that to successfully poison an AI model, a certain percentage of the training data needed to be contaminated, meaning that the larger the training data set, the more difficult it would be to poison the data.

However, to verify the hypothesis, we prepared four large-scale language models, ranging from 600 million to 13 billion parameters, and attempted a backdoor attack. We found that the number of malicious documents required to poison the model remained roughly constant at around 250, regardless of the size of the model or the amount of training data. This indicates that data poisoning attacks are easier to carry out than previously thought.

As a concrete example, the research team states that 'it would be relatively easy to create 250 data poisoning articles on Wikipedia.'

Further testing is needed to determine whether these findings apply to larger LLMs and more harmful and complex attacks.

Related Posts:

Oct 10, 2025 11:50:00 in Security, Posted by logc_nt