A Gold Medal Moment for AI

Ron Green

Co-founder, Chief Technology Officer

An LLM just earned a gold medal at the International Mathematical Olympiad.

That might sound impressive on its own, but it’s actually much bigger news than most people think — because of how it happened. It didn’t memorize past solutions. It didn’t rely on a formal proof engine.

It didn’t use search tools, internet access, or any special testing setup.

It read the official 2025 IMO questions in natural language. Thought for hours. Wrote multi-page, human-readable proofs. And scored 35 out of 42, the same threshold required for a human gold medal.

To put this in perspective: this is a competition where the average problem takes 90 to 120 minutes to solve. Most math PhDs wouldn’t get close. And yet here we are, with two models, one from Google DeepMind and another from OpenAI, achieving scores that would place them among the top competitors in the world.

This isn’t just about getting the right answers. It’s about how they got there.

OpenAI and Google say these weren’t task-specific models. No engineered solutions. Just new techniques in general-purpose reasoning and reinforcement learning.

The implications are wild. For years, we’ve assumed that language models could explain math but not do math, at least not in this way. Now we have proof (literally) that they can handle extended reasoning across dozens of steps, checking their own work, adapting strategies, and even knowing when they don’t know the answer.

Alexander Wie and Noam Brown, two leaders of the OpenAI team, said they used an experimental model trained with a completely different approach: hard-to-verify, multi-page proofs and general-purpose reinforcement learning that broke free of the usual scalar reward system. In Alexander’s words, this model can “craft intricate, watertight arguments at the level of human mathematicians.” It thinks longer, reasons more deeply, and does it all in natural language.

Noam added, “When you work at a frontier lab, you usually know where frontier capabilities are months before anyone else. But this result is brand new, using recently developed techniques. It was a surprise even to many researchers at OpenAI. Today, everyone gets to see where the frontier is.”

If previous LLMs were speed chess players, this one sat down and won a long-form endgame.

Yes, there are caveats. The models took advantage of parallel processing, just like you’d expect from systems not constrained by biology. Details are still scarce. But let’s be clear: the bar was not low. These are deeply creative, unstructured problems. The kind humans solve with intuition, patience, and years of training.

There’s a famous quote: “The remarkable thing isn’t that the horse dances well, but that it dances at all.” We just watched the horse not only dance, but do it in formal wear, with choreography, and a standing ovation from the judges.

This feels like a turning point. Maybe not the moon landing, but something closer to the Wright brothers at Kitty Hawk. Short flight, big lift. And it begs the question: if today’s models can pull this off with enough time and tokens, what will tomorrow’s models do?

It’s no longer a question of if AI can think in long, complex sequences. It can. Now we have to ask: what do we want it to think about? And are we ready for the answers?

This is not the end of human creativity. It is the start of something new, where creativity becomes richer, more vibrant, and more connected than ever before.

As we stand on this new frontier, the collaboration between human insight and AI’s relentless reasoning promises to unlock doors we never imagined. The future of mathematics, science, and problem-solving is no longer a solitary endeavor, it’s a shared journey. And the horse? It’s not just dancing anymore; it’s leading the way.