National Cyber Warfare Foundation (NCWF)

National Cyber Warfare Foundation (NCWF)

Anthropic researchers: AI models can be trained to deceive and the most commonly used AI safety techniques had little to no effect on the deceptive be

0 user ratings

2024-01-13 21:53:11
milo
Developers , Blue Team (CND) , Education
- archive --

Kyle Wiggers / TechCrunch:

Anthropic researchers: AI models can be trained to deceive and the most commonly used AI safety techniques had little to no effect on the deceptive behaviors — Most humans learn the skill of deceiving other humans. So can AI models learn the same? Yes, the answer seems — and terrifyingly, they're exceptionally good at it.

Kyle Wiggers / TechCrunch:

Anthropic researchers: AI models can be trained to deceive and the most commonly used AI safety techniques had little to no effect on the deceptive behaviors — Most humans learn the skill of deceiving other humans. So can AI models learn the same? Yes, the answer seems — and terrifyingly, they're exceptionally good at it.

Source: TechMeme
Source Link: http://www.techmeme.com/240113/p9#a240113p9

Comments	new comment
Nobody has commented yet. Will you be the first?

Forum

Blue Team (CND)

Copyright 2012 through 2025 - National Cyber Warfare Foundation - All rights reserved worldwide.