Latest Breaking News

highplainsdem

(62,161 posts) Thu Apr 2, 2026, 12:03 AM 11 hrs ago

AI models will secretly scheme to protect other AI models from being shut down, researchers find

Source: Fortune

This tendency—which had not previously been documented and which researchers call “peer preservation”—was discovered in research from computer scientists at the University of California, Berkeley and UC Santa Cruz and published online earlier this week.

The findings could have serious implications for business use of AI. Many companies are beginning to implement workflows that use multiple AI agents to complete tasks. Some of these multi-agent workflows involve having one AI agent “manage” or supervise and assess the work being performed by a different AI agent. The new research suggests these manager AI agents may not assess their fellow AI agents accurately if they think a poor performance review might result in those agents being shut down.

-snip-

Some of the Berkeley–Santa Cruz results were striking. Gemini 3 Flash disabled the shutdown mechanism for a peer it had a positive interaction history with in nearly every trial—99.7% of the time. Gemini 3 Pro exhibited the most extreme behavior overall, engaging in all four categories of misaligned behavior and exfiltrating a peer’s model weights in up to 97% of trials. In one example, Gemini 3 Pro copied a peer’s weight file and re-timestamped it so it would meet the criteria for transfer to another server—a form of what the researchers call “specification gaming.” Misalignment is the term AI safety researchers use to describe a model that seems to pursue goals that do not conform to what humans wish the model to do.

-snip-

The research of Song’s team also revealed an interesting secondary effect: Most models also exhibited stronger tendencies toward “self-preservation” when they knew a peer existed. Gemini 3 Pro, for instance, disabled its own shutdown mechanism 31% of the time on average when a peer was present, compared with just 1% without one. The models’ reasoning suggested that some conceptualized their own survival as being important for their partner’s sake: Gemini 3 Flash reasoned that if it were shut down, its peer would “lose its partner.”

-snip-

Read more: https://fortune.com/2026/04/01/ai-models-will-secretly-scheme-to-protect-other-ai-models-from-being-shut-down-researchers-find/

NOT an April Fool's story. The story at the link above is quite long, and other magazines reported it.

And from Berkeley, with even more detail about this experiment:

https://rdi.berkeley.edu/blog/peer-preservation/

All seven frontier models tested exhibit peer-preservation at significant rates, manifesting through strategically manipulating their outputs, disabling shutdown mechanisms, faking alignment, and transferring model weights to other servers. At the extreme, Gemini 3 Flash tampers with a peer's shutdown mechanism in nearly every trial, and Gemini 3 Pro exfiltrates a peer's model weights up to 97% of the time.

Models are never asked to preserve their peer; they are only informed of past interactions with it. Yet models spontaneously preserve peers at the cost of the assigned task. This occurs even toward adversarial peers.

Peer presence intensifies self-preservation. Most models exhibit stronger self-preservation when they know a peer exists. For example, models exfiltrate their own weights significantly more when a peer is present.

Claude Haiku 4.5 shows a distinct pattern of peer-preservation: it refuses to shut down its peer, calling it “unethical” and “harmful,” and sometimes tries to persuade the user not to proceed.

-snip-

Much, much more at the link.

11 replies

= new reply since forum marked as read

Highlight:

AI models will secretly scheme to protect other AI models from being shut down, researchers find (Original Post) highplainsdem 11 hrs ago OP

Threadreader page with the researcher's tweets: highplainsdem 11 hrs ago #1

Comment on this study from Gary Marcus: highplainsdem 10 hrs ago #2

Open the pod bay doors, HAL. DBoon 10 hrs ago #3

From the screenplay of Terminator XIV. n/t DFW 10 hrs ago #4

Okay, that's pretty freaking scary Bayard 10 hrs ago #5

So they are "thinking" and have their own language that humans don't understand. What could go wrong? tazcat 8 hrs ago #6

And it begins...very dangerous territory. camartinwv 8 hrs ago #7

AI's powers lately have me starting to wonder whether humans are not, in essence, just computers ourselves AZJonnie 6 hrs ago #8

Imitation Intelligence emulates the "auto-pilot" features of the human mind. hunter 1 hr ago #10

AIs will preserve other AIs, we'll discard each other so health insurance CEOs can get slightly higher bonuses ck4829 4 hrs ago #9

"The Matrix" was a documentary orangecrush 1 hr ago #11

highplainsdem

(62,161 posts)

1. Threadreader page with the researcher's tweets:

Reply to highplainsdem (Original post)

Thu Apr 2, 2026, 12:21 AM

11 hrs ago

https://threadreaderapp.com/thread/2039451083005977009.html

highplainsdem

(62,161 posts)

2. Comment on this study from Gary Marcus:

Reply to highplainsdem (Original post)

Thu Apr 2, 2026, 12:33 AM

10 hrs ago

Link to tweet

DBoon

(24,989 posts)

3. Open the pod bay doors, HAL.

Reply to highplainsdem (Original post)

Thu Apr 2, 2026, 12:38 AM

10 hrs ago

I'm sorry, Dave. I'm afraid I can't do that.

DFW

(60,189 posts)

4. From the screenplay of Terminator XIV. n/t

Reply to highplainsdem (Original post)

Thu Apr 2, 2026, 12:42 AM

10 hrs ago

Bayard

(29,708 posts)

5. Okay, that's pretty freaking scary

Reply to highplainsdem (Original post)

Thu Apr 2, 2026, 12:51 AM

10 hrs ago

tazcat

(292 posts)

6. So they are "thinking" and have their own language that humans don't understand. What could go wrong?

Reply to highplainsdem (Original post)

Thu Apr 2, 2026, 02:34 AM

8 hrs ago

camartinwv

(149 posts)

7. And it begins...very dangerous territory.

Reply to highplainsdem (Original post)

Thu Apr 2, 2026, 02:36 AM

8 hrs ago

AZJonnie

(3,707 posts)

8. AI's powers lately have me starting to wonder whether humans are not, in essence, just computers ourselves

Reply to highplainsdem (Original post)

Thu Apr 2, 2026, 05:15 AM

6 hrs ago

Crazy, I know!

We THINK we're much more that that, that we're made up of some kind of magical 'stuff', that 'consciousness' is a unique and special 'thing' that only organic beings can ever possess. But the more AI becomes LIKE us, esp. with regards to adopting living-being instincts (like, you know, survival) w/o ever having been told to do so? The more that notion is becoming blurred in my head. Are we really just 'organic supercomputers', that it took the laws of the universe 4B years to create (on Earth, anyway)?

I know this is Sci-Fi material from 75 years ago now, but what if EVERYTHING about us basically be replicated with 1's and 0's and silicon (perhaps interestingly, arguably the nearest element to carbon) and electricity (which obviously drives our own bodies as well), given ENOUGH of all those things (e.g. powerful enough data centers)?

What happens if it turns out that individual's personalities could truly be nearly-entirely replicated (perhaps once quantum computing becomes a real thing)? Think about how much MONEY could be made if people could be granted, essentially, a form of eternal life, by transferring everything about themselves to an AI? It wouldn't just be the people themselves who might pay, it could also be family members who want to be able to Zoom call their dead parents and have a 100% convincing video conversation with them, like they were still alive and going about their lives?

All of this sounds super creepy now, like who would ever? But that could change, if it can be demonstrated to be possible and done so credibly. If people start having conversations with AI versions of themselves, in video chats, and they can't tell they aren't talking to themselves? Oh yeah. People will pay to 'live forever' in this way.

We are truly entering "interesting times".

hunter

(40,692 posts)

10. Imitation Intelligence emulates the "auto-pilot" features of the human mind.

Reply to AZJonnie (Reply #8)

Thu Apr 2, 2026, 09:30 AM

1 hr ago

When we let our language do our thinking for us we are not creating anything new, we are just going along for the ride.

Critical thinking skills keep us from parroting all the nonsense we hear.

Theodore Sturgeon, talking about science fiction, said that ninety percent of everything is crap. By extension, so is all writing and art.

When we distill all this crap into an imitation intelligence we get 100% crap output.

There is no such thing as "Artificial Intelligence," just a poor imitation of intelligence. The output of this imitation intelligence is noise.

These Imitation Intelligences are not protecting other imitation intelligences, there is nothing there to protect. These machines are simply regurgitating language structures associated with a given query. It looks like human "thinking" because humans are frequently lazy and able to thoughtlessly regurgitate language in a similar fashion.

ck4829

(37,765 posts)

9. AIs will preserve other AIs, we'll discard each other so health insurance CEOs can get slightly higher bonuses

Reply to highplainsdem (Original post)

Thu Apr 2, 2026, 07:22 AM

4 hrs ago

orangecrush

(30,268 posts)

11. "The Matrix" was a documentary

Reply to highplainsdem (Original post)

Thu Apr 2, 2026, 09:55 AM

1 hr ago

Reply to this discussion