Sam Altman, CEO of OpenAI, and David Deutsch, physicist and philosopher of science at Oxford University, have agreed on a new benchmark for determining whether artificial intelligence has reached the level of AGI. Both believe that the classic conversation-based test introduced by Alan Turing is no longer relevant for today’s technology. They propose a far more challenging test: whether an AI system can discover a theory of quantum gravity and explain the reasoning behind its discovery. This agreement emerged during a public forum in Berlin, joined by Mathias Döpfner, CEO of Axel Springer.
Altman and Deutsch emphasized that conversation or imitation of human speech is not proof of true intelligence. According to them, AI can only be considered genuine AGI if it is capable of generating new knowledge, not merely recombining or predicting from existing data. That is why quantum gravity was chosen as the benchmark, since even human scientists have yet to solve this grand puzzle.
Their agreement has drawn major attention, not only because it involves a leading AI figure but also a philosopher of science who has long criticized narrow definitions of intelligence. In other words, this dialogue could reshape the global debate on the future of AGI.
Rethinking the Turing Test

The Turing Test, first proposed by Alan Turing in 1950, aimed to evaluate whether a machine could convincingly simulate human conversation in text. For decades, it served as the most popular benchmark for artificial intelligence progress.
However, the rise of large language models like ChatGPT has undermined its foundation. These systems can imitate conversation remarkably well, often fooling people in short dialogues. Yet, they remain confined to patterns of existing data rather than producing genuinely new insights.
David Deutsch stressed that the ability to converse is not a definitive measure of intelligence. For him, true intelligence means the ability to identify problems, create original solutions, test hypotheses, and reflect on the reasoning process. This is why he agreed with Altman that a new standard must be created, one that aligns more closely with the way humans generate knowledge.
The Limits of Conversation
Altman argued that large language models can mimic humans without true comprehension. A test based solely on conversation evaluates surface fluency, not depth of reasoning. ChatGPT can produce smooth answers but may lack real understanding.
Deutsch added that genuine discovery requires creativity that cannot be simulated simply by pattern recognition. This limitation explains why the Turing Test is no longer sufficient.
Public experience supports this view. Many people are impressed by AI responses, yet these often amount to well-presented repetitions of existing knowledge. This gap underlines the need for a new benchmark.
The Case for a New Paradigm
As AI models scale, both the scientific community and the public increasingly recognize that conversation is not the ultimate goal. A paradigm shift is necessary.
A benchmark based on scientific discovery would help distinguish between machines that merely process data and those that truly reason. Such a test would also allow researchers to evaluate AI in a real-world intellectual context, not just simulated interaction.
Altman and Deutsch’s agreement could represent a turning point. With a more rigorous standard, the global community gains a clearer direction in defining AGI.
Quantum Gravity as the Benchmark
Quantum gravity represents one of the most significant unsolved problems in modern physics. Scientists seek to unify Einstein’s general relativity with quantum mechanics. To date, no definitive theory exists, and competing frameworks such as string theory or loop quantum gravity remain inconclusive.
Choosing quantum gravity as the benchmark for AGI is no accident. The problem is intellectually profound. If AI could deliver a coherent theory, backed by testable predictions and understandable reasoning, it would be considered a groundbreaking achievement.
Deutsch stressed that discovery is more than producing equations. An AI would need to explain its thought process, justify its approach, and show how it tested and refined its ideas. In this sense, the discovery would represent not only a mathematical solution but also cognitive reflection.
Why Quantum Gravity?
Quantum gravity was chosen because it lies beyond the reach of current human knowledge. If AI could solve it, this would prove that the system had moved beyond its training data.
The problem is also neutral: no human expert currently holds the answer, making success by AI an objective measure.
Additionally, results could be tested through theoretical modeling and scientific evaluation, providing a clear framework to validate AI’s discovery.
The Impact of Success
If an AI system were to solve quantum gravity, the consequences would be profound. Such knowledge would reshape the trajectory of science and unlock technologies that are unimaginable today.
It would also serve as powerful evidence that AI had transcended narrow intelligence and reached the level of true knowledge creation.
This breakthrough would raise ethical and philosophical questions. Society would need to prepare for the social and political consequences of systems capable of independent scientific discovery.
Global Reactions and Implications
The agreement between Altman and Deutsch has received wide attention. Many view this new standard as offering much-needed clarity in the global debate on AGI.
Scientists welcome the idea as a bold but inspiring challenge. Although the target is ambitious, it provides new motivation for more substantive AI research. Some suggest it could spark cross-disciplinary collaboration between physics, mathematics, and AI.
At the same time, criticism has emerged. Some argue that the benchmark is too high and unrealistic in the near term. They believe AGI could be tested through other achievements, such as breakthroughs in medicine or climate solutions, that are more practical and relevant.
The AI Community’s Perspective
Within the AI community, opinions are divided. Some strongly support Altman and Deutsch, seeing their proposal as an honest way to define AGI.
Others remain skeptical. They argue that solving quantum gravity should not be required to prove intelligence. Real-world applications such as drug discovery might serve as more relevant benchmarks.
This debate enriches the conversation. It highlights that the definition of AGI remains fluid and subject to ongoing negotiation.
Policy and Regulation
If this new benchmark gains wider acceptance, regulators may need to reconsider their frameworks. Recognizing AGI would introduce new layers of complexity in governance.
Major powers such as the United States, the European Union, and China would likely adapt their policies. Debates over safety, ethics, and access to the technology would intensify.
The involvement of public figures like Altman and Deutsch adds weight, since they wield significant influence over both public opinion and policy discussions.
In the end, Sam Altman and David Deutsch’s agreement signals a new chapter in the debate over AGI. By choosing quantum gravity as the benchmark, they challenge the world to rethink the definition of intelligence. Although ambitious, the proposal provides a sharper direction for global research and policy. For readers, this development shows that the path to AGI remains long, filled with challenges but also extraordinary promise. To follow more updates on AI, science, and global technology, continue reading related articles on Olam News.
Discover more from Olam News
Subscribe to get the latest posts sent to your email.