Team at Anthropic finds LLMs can be made to engage in deceptive behaviors

A team of AI experts at Anthropic, the group behind the chatbot Claude, has found that LLMs can be exploited to engage in deceptive behavior with general users. They have published a paper describing their research into the problem on the arXiv preprint server.
文 » A