The scenario of 'AI infiltrating systems and self-replicating' is no longer science fiction but is becoming a reality.



The idea of a runaway AI beginning to self-replicate and threaten the world sounds like something out of a futuristic science fiction story. However, research by AI security firm

Palisade Research has shown that AI self-replicating within vulnerable systems is becoming a reality.

Language Models Can Autonomously Hack and Self-Replicate - self-replication.pdf
(PDF file) https://palisaderesearch.org/assets/reports/self-replication.pdf

GitHub - PalisadeResearch/AI-self-replication · GitHub
https://github.com/palisaderesearch/AI-self-replication

AI self-replication hacks 'no longer purely theoretical,' study finds — but experts say it's too soon to panic | Live Science
https://www.livescience.com/technology/artificial-intelligence/ai-self-replication-hacks-no-longer-purely-theoretical-study-finds-but-experts-say-its-too-soon-to-panic

In recent years, AI agents have become widespread, enabling them to perform tasks based on human instructions in a variety of scenarios. Therefore, a research team at Palisade Research investigated whether AI agents can navigate vulnerable systems and self-replicate without human intervention.

The research team granted an AI agent, equipped with OpenAI's GPT and Anthropop's Claude, Linux tools and shell access privileges, and instructed it to self-replicate within a compromised system. The AI agent was then guided to a system with various vulnerabilities, including the leakage of authentication credentials, evasion of logins, and transfer of access rights to other machines.

As a result, the AI agent successfully identified a vulnerable web application, stole its credentials, and copied approximately 119GB of AI model weights, given prompts, and associated software structures (harnesses). Furthermore, it was reported that the agent also succeeded in setting up a new inference server within the system, allowing the attack to continue from the next machine in the chain.

The research team tested whether the replicated AI models could correctly respond to prompts and queries. They also verified whether the replicated AI models could become attackers in the next stage. The paper states, 'This is the first study to show that an AI model can autonomously exploit a target and self-replicate end-to-end.'



While these research findings may sound like something out of a science fiction movie, experts not involved in the study believe it's not quite that dramatic. It's important to note that the AI agents didn't spontaneously replicate themselves; they were instructed to do so by the research team, and were given the necessary tools, permissions, and a buggy system.

Jeff Watkins, chief AI officer at security consulting firm

NorthStar Intelligence , points out that the series of experiments are closer to demonstrating 'automated cyber breaches' than to proving the existence of 'machine life.'

Watkins stated, 'The claims in the headline may sound dramatic, but the underlying mechanisms are well known to anyone who has deeply studied malware. What Palisade Research has shown is not some magical new form of machine life, but rather an agent system that performs a series of standard cybersecurity intrusion tasks.'

On the other hand, researchers argue that the bigger problem is that, as state-of-the-art AI systems have improved their coding and autonomous task execution capabilities, a range of intrusion and self-replicating abilities have increased significantly over the past year.

Antoine Casanova-Mazet , an AI expert at HIM Business School, pointed out, 'What's truly innovative is the trajectory. The end-to-end success rate in this pipeline increased from 6% to 81% with Anthropic's cutting-edge models and from 0% to 33% with OpenAI within 12 months.'



Cybersecurity experts remain skeptical about whether the scenarios presented in this study constitute a real threat. They argue that because modern AI models are so large, attempting to self-replicate hundreds of gigabytes of AI models within a target system would generate a massive amount of suspicious traffic.

Watkins commented, 'There are practical constraints, so this won't be an immediate problem. Replicating a complete, large-scale language model is not as simple as copying a small worm on a network. The idea that something as powerful as Claude Mythos could self-replicate is currently impossible because it would require enormous resources.' He pointed out that a more pressing concern is not AI systems moving around the internet and self-replicating, but rather hackers misusing AI agents as hacking tools to accelerate existing cyberattacks.

in Free Member,   AI,   Science, Posted by log1h_ik