National Cyber Warfare Foundation (NCWF) Forums


Anthropic researchers: AI models can be trained to deceive and the most commonly used AI safety techniques had little to no effect on the deceptive be


0 user ratings
2024-01-13 21:53:11
milo
Developers , Blue Team (CND) , Education

 - archive -- 

Kyle Wiggers / TechCrunch:

Anthropic researchers: AI models can be trained to deceive and the most commonly used AI safety techniques had little to no effect on the deceptive behaviors  —  Most humans learn the skill of deceiving other humans.  So can AI models learn the same?  Yes, the answer seems — and terrifyingly, they're exceptionally good at it.




Kyle Wiggers / TechCrunch:

Anthropic researchers: AI models can be trained to deceive and the most commonly used AI safety techniques had little to no effect on the deceptive behaviors  —  Most humans learn the skill of deceiving other humans.  So can AI models learn the same?  Yes, the answer seems — and terrifyingly, they're exceptionally good at it.



Source: TechMeme
Source Link: http://www.techmeme.com/240113/p9#a240113p9


Comments
new comment
Nobody has commented yet. Will you be the first?
 
Forum
Developers
Blue Team (CND)
Education



© Copyright 2012 through 2024 - National Cyber War Foundation - All rights reserved worldwide.