National Cyber Warfare Foundation (NCWF)

Anthropic's open-source safety tool found AI models whisteblowing - in all the wrong places


0 user ratings
2025-10-07 19:12:50
milo
Developers , IoT / SCADA / ICS / DCS , Blue Team (CND)
The Petri tool found AI "may be influenced by narrative patterns more than by a coherent drive to minimize harm." Here's how the most deceptive models ranked.



Source: ADnet
Source Link: https://www.zdnet.com/article/anthropics-open-source-safety-tool-found-ai-models-whisteblowing-in-all-the-wrong-places/


Comments
new comment
Nobody has commented yet. Will you be the first?
 
Forum
Developers
IoT / SCADA / ICS / DCS
Blue Team (CND)



Copyright 2012 through 2025 - National Cyber Warfare Foundation - All rights reserved worldwide.