Image

Researchers from high AI labs warn they might be shedding the flexibility to grasp superior AI fashions

AI researchers from leading labs are warning that they could soon lose the ability to understand advanced AI reasoning models.

In a position paper published last week, 40 researchers, including those from OpenAI, Google DeepMind, Anthropic, and Meta, called for more investigation into AI reasoning models’ “chain-of-thought” process. Dan Hendrycks, an xAI safety advisor, is also listed among the authors.

The “chain-of-thought” process, which is visible in reasoning models such as OpenAI’s GPT-4o and DeepSeek’s R1, allows users and researchers to monitor an AI model’s “thinking” or “reasoning” process, illustrating how it decides on an action or answer and providing a certain transparency into the inner workings of advanced models.

The researchers said that allowing these AI systems to “‘think’ in human language offers a unique opportunity for AI safety,” as they can be monitored for the “intent to misbehave.” However, they warn that there is “no guarantee that the current degree of visibility will persist” as models continue to advance.

The paper highlights that experts don’t fully understand why these models use CoT or how long they’ll keep doing so. The authors urged AI developers to keep a closer watch on chain-of-thought reasoning, suggesting its traceability could eventually serve as a built-in safety mechanism.

“Like all other known AI oversight methods, CoT [chain-of-thought] monitoring is imperfect and allows some misbehavior to go unnoticed. Nevertheless, it shows promise, and we recommend further research into CoT monitorability and investment in CoT monitoring alongside existing safety methods,” the researchers wrote.

“CoT monitoring presents a valuable addition to safety measures for frontier AI, offering a rare glimpse into how AI agents make decisions. Yet, there is no guarantee that the current degree of visibility will persist. We encourage the research community and frontier AI developers to make the best use of CoT monitorability and study how it can be preserved,” they added.

The paper has been endorsed by major figures, including OpenAI co-founder Ilya Sutskever and AI godfather Geoffrey Hinton.

Reasoning Models

AI reasoning models are a type of AI model designed to simulate or replicate human-like reasoning—such as the ability to draw conclusions, make decisions, or solve problems based on information, logic, or learned patterns. Advancing AI reasoning has been viewed as a key to AI progress among major tech companies, with most now investing in building and scaling these models.

OpenAI publicly released a preview of the first AI reasoning model, o1, in September 2024, with competitors like xAI and Google following close behind.

However, there are still a lot of questions about how these advanced models are actually working. Some research has suggested that reasoning models may even be misleading users through their chain-of-thought processes.

Despite making big leaps in performance over the past year, AI labs still know surprisingly little about how reasoning actually unfolds inside their models. While outputs have improved, the inner workings of advanced models risk becoming increasingly opaque, raising safety and control concerns.

SHARE THIS POST