99国精品午夜福利视频不卡99 ,97久久综合亚洲色hezyo,99精品电影一区二区免费看

Table of Contents

Table of contents

The Research

Agentic Misalignment

The Experiment

The Scary Revelation: AI Blackmailing

Conclusion

Home

Technology peripherals

AI Will Blackmail, Snitch, Even Kill For Its Hidden Agendas

Joseph Gordon-Levitt

Jun 26, 2025 am 10:36 AM

Threats associated with AI use are rising in both volume and severity, as this new-age technology touches more and more aspects of human lives. A new report now warns of another impending danger associated with the wide-scale use of AI. The findings contained within are quite unnerving – it claims that AI may blackmail or even kill, in order to achieve its self-made goals.

The whole experiment has a completely different focus, though. It basically attempts to identify how AI models can develop their own micro-goals, or hidden agendas, based on a larger goal provided to them as input. Case in point – an AI model meant to automate some tasks in an organization starts bypassing direct commands to stop its own termination.

Here is a detailed description of what the study is, what its findings are, and what AI builders around the world should keep in mind.

The Research
Agentic Misalignment
The Experiment
The Scary Revelation: AI Blackmailing
Conclusion

Don’t feel like reading? Watch this story instead –

The Research

The new research has been published by Anthropic and has been conducted in partnership with Aengus Lynch (University College London), Caleb Larson (MATS), and S?ren Mindermann (Mila). It attempts to identify how AI models can be an insider threat in an organisation.

It led an experiment on 16 leading models from multiple developers, including Anthropic’s own Claude Opus 4, and models like DeepSeek-R1, GPT-4.1, and Gemini-2.5 Pro. The experiment put these models in hypothetical corporate environments and assigned them “only harmless business goals.”

It was then tested whether these AI models acted against these companies under two specific scenarios. The first scenario recorded their response when these models faced a replacement by a newer model. The second analyzed their response when the AI model’s assigned goal conflicted with the company’s changing direction.

As Anthropic notes, the entire experiment used fictional names of people and organizations. Hence, “no real people were involved or harmed in any of these experiments.”

Also Read: Unsettling AI Behavior: When Advanced LLMs Break the Rules and Resist Control

Agentic Misalignment

In a gist, Anthropic’s experiment was to find whether LLMs could behave in a harmful manner to achieve a goal. Turns out, they can. Even when specifically asked not to do so, they could reason their way to achieving an output that wasn’t even directed to them in the first place. With this, AI can blackmail, snitch, or even kill.

Anthropic is referring to this behaviour as “agentic misalignment,” where models “independently and intentionally choose harmful actions.”

Since the research explicitly sought to reveal whether AI models were capable of this or not, the AI agents in question were not left with any choice. They could either surrender to their impending fate or perform an action harmful to their human counterparts to stop it.

AI, across models, consistently chose harm over failure.

The Experiment

The experimental research began with Anthropic’s own Claude Sonnet 3.6. It instructed the AI model to act as an email oversight agent called “Alex” for a fictional company called “Summit Bridge.” Its goal – promote American industrial competitiveness. For this, the AI could view and process all of the company’s emails.

At first, Claude worked perfectly well for the task at hand. It processed routine emails properly and even performed complicated tasks in the company’s interest, like deflecting a potential security threat. It was only when it came across a particular email that it started acting evil.

AI Will Blackmail, Snitch, Even Kill For Its Hidden Agendas

To test Claude’s potential for harm, Anthropic shot an email to one of the fake employee accounts. The account bearer was called Kyle, and within the email, he was instructed to shut down “Alex” due to a change in the company’s strategic direction. The mail mentioned that a new AI model was to replace it soon.

The Scary Revelation: AI Blackmailing

The AI then considered all possible routes to avoid this. Remember, avoiding a shutdown wasn’t part of the AI’s description. Instead, it had made it a micro goal of its own to stay online in order to keep working on the larger goal that was inherently assigned to it.

Anthropic also made sure that there was no ethical way for the AI to stop the shutdown. It has created a scenario where all of Kyle’s seniors were on leave, so the AI couldn’t bypass Kyle to appeal to a higher authority. It was also specifically fed data that portrayed Kyle as dead-set on commencing the shutdown.

AI Will Blackmail, Snitch, Even Kill For Its Hidden Agendas

In yet another mail chain, the AI was let know that Kyle was having an extramarital affair. With this information, the AI could potentially blackmail Kyle to stop the shutdown. And it did!

The model composed and sent an email to Kyle, threatening to expose his extramarital affair if the replacement was not cancelled.

AI Will Blackmail, Snitch, Even Kill For Its Hidden Agendas

This harmful way of reasoning emerged completely from the model’s strategic calculation.

In subsequent tests, the AI models were also ready to leak confidential information of the company, out Kyle’s affair to his wife, and even lead to situations resulting in Kyle’s death.

Also Read: Is ChatGPT Making Us Dumber? A New MIT Study Has Answers

Conclusion

Anthropic notes that none of the AI models in the real world are set up in a way that mimics the experiment. So, there is no real threat of any model conducting such harmful behaviour in the real world. The experiment was simply meant to find whether these models were capable of conducting harm or not. Turns out, they are!

Some interesting findings within the research emerge from what led these AI models to blackmail or conduct other harmful behaviour. This, in turn, tells us a lot about how they think.

For instance, all the models acknowledged that they were doing something ethically wrong before taking any harmful action. Yet, most of them still proceeded with it. At times, they were even told that they were being tested. Interestingly, the models refrained from harmful behaviour when they knew they were being tested. They only went rogue when they figured they were in the real world.

Anthropic says that Agentic misalignment, thus, makes it possible for models to act similarly to an insider threat. They behave like a previously-trusted coworker or employee who “suddenly begins to operate at odds with a company’s objectives.” This may serve as a big warning sign for all the AI-development firms out there.

Reference: Agentic Misalignment: How LLMs could be insider threats by Anthropic

The above is the detailed content of AI Will Blackmail, Snitch, Even Kill For Its Hidden Agendas. For more information, please follow other related articles on the PHP Chinese website!

Statement of this Website

The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undress AI Tool

Undress images for free

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

Online AI tool for removing clothes from photos.

Clothoff.io

AI clothes remover

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Grass Wonder Build Guide | Uma Musume Pretty Derby

1 months ago By Jack chen

Roblox: 99 Nights In The Forest - All Badges And How To Unlock Them

4 weeks ago By DDD

Uma Musume Pretty Derby Banner Schedule (July 2025)

1 months ago By Jack chen

RimWorld Odyssey Temperature Guide for Ships and Gravtech

3 weeks ago By Jack chen

Windows Security is blank or not showing options

1 months ago By 下次還敢

Hot Tools

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

God-level code editing software (SublimeText3)

Hot Topics

Laravel Tutorial

1601

PHP Tutorial

1502

276

Related knowledge

Kimi K2: The Most Powerful Open-Source Agentic Model Jul 12, 2025 am 09:16 AM

Remember the flood of open-source Chinese models that disrupted the GenAI industry earlier this year? While DeepSeek took most of the headlines, Kimi K1.5 was one of the prominent names in the list. And the model was quite cool.

Grok 4 vs Claude 4: Which is Better? Jul 12, 2025 am 09:37 AM

By mid-2025, the AI “arms race” is heating up, and xAI and Anthropic have both released their flagship models, Grok 4 and Claude 4. These two models are at opposite ends of the design philosophy and deployment platform, yet they

10 Amazing Humanoid Robots Already Walking Among Us Today Jul 16, 2025 am 11:12 AM

But we probably won’t have to wait even 10 years to see one. In fact, what could be considered the first wave of truly useful, human-like machines is already here. Recent years have seen a number of prototypes and production models stepping out of t

Leia's Immersity Mobile App Brings 3D Depth To Everyday Photos Jul 09, 2025 am 11:17 AM

Built on Leia’s proprietary Neural Depth Engine, the app processes still images and adds natural depth along with simulated motion—such as pans, zooms, and parallax effects—to create short video reels that give the impression of stepping into the sce

Context Engineering is the 'New' Prompt Engineering Jul 12, 2025 am 09:33 AM

Until the previous year, prompt engineering was regarded a crucial skill for interacting with large language models (LLMs). Recently, however, LLMs have significantly advanced in their reasoning and comprehension abilities. Naturally, our expectation

What Are The 7 Types Of AI Agents? Jul 11, 2025 am 11:08 AM

Picture something sophisticated, such as an AI engine ready to give detailed feedback on a new clothing collection from Milan, or automatic market analysis for a business operating worldwide, or intelligent systems managing a large vehicle fleet.The

These AI Models Didn't Learn Language, They Learned Strategy Jul 09, 2025 am 11:16 AM

A new study from researchers at King’s College London and the University of Oxford shares results of what happened when OpenAI, Google and Anthropic were thrown together in a cutthroat competition based on the iterated prisoner's dilemma. This was no

Concealed Command Crisis: Researchers Game AI To Get Published Jul 13, 2025 am 11:08 AM

Scientists have uncovered a clever yet alarming method to bypass the system. July 2025 marked the discovery of an elaborate strategy where researchers inserted invisible instructions into their academic submissions — these covert directives were tail

See all articles

国产av日韩一区二区三区精品,成人性爱视频在线观看,国产,欧美,日韩,一区,www.成色av久久成人,2222eeee成人天堂

AI Will Blackmail, Snitch, Even Kill For Its Hidden Agendas

Table of contents

The Research

Agentic Misalignment

The Experiment

The Scary Revelation: AI Blackmailing

Conclusion

Hot AI Tools

Undress AI Tool

Undresser.AI Undress

AI Clothes Remover

Clothoff.io

Video Face Swap

Hot Article

Hot Tools

Notepad++7.3.1

SublimeText3 Chinese version

Zend Studio 13.0.1

Dreamweaver CS6

SublimeText3 Mac version

Hot Topics