June 2026

Visiting the Mittag-Leffler institute this week. A pleasant stay thus far, with lots of great researchers gathered to push towards formalisation of ∞-categories. Surprising amount of progress is being made here. Especially in the synthetic methods. Agda, with its flexible type system and liberal collection of extension (not unlike GHC), is at an advantage here. I have been playing around with a synthetic ∞-category library by Sam, which seems promising. We are not quite at the levels of convenience normal every day HoTT gives in the directed case. But I feel like progress is being made.

2026-06-11, Thursday

“Ninety percent of everything is crap”, as Sturgeon put it. When it comes to AI I dare say we can up the percentage to 99.9999%, and still be conservative. Yet, it is not one hundred.

Over the last few months there has been a phase-change, a threshold was passed, and we seem to be entering a new era. Programmers are ahead of the curve here, where word on the street was that in November large language models had become what I would call competent at writing programs at the medium scale. While programmers would use LLMs to generate code before that, it was an error-prone process requiring manual intervention at every point. While we have had “vibe-coding” since early last year, for most tasks requiring even a minimum of correctness one would have to revert to manual programming. After the November tipping point, I would say that manual intervention is starting to look optional. Competent review is still a must, but feedback can be given to the LLM like you would to a very eager and extremely knowledgeable co-worker – albeit one who often lacks context, and has Memento-memory.

The LLM is still lazy, and does not make good architectural decisions if left to its own devices. But at the medium level, say working fully within a confined part of the codebase, I daresay it is now better than many human programmers. I envision a future where expert software architects work with LLMs to build high quality software helping society move forward. I fear a future where incompetent vibe-coders unleash LLMs to create bad software nobody asked for, creating fragile, unreliable systems making people’s lives worse¹.

But, Programming was not what I wanted to talk about. I wanted to talk about its older (elderly?) brother, Mathematics. Mathematics competes with Philosophy for the title of being the ultimate human intellectual endeavour. At the crux of mathematics lies the mathematical proof, the criterion for mathematical knowledge. While there are nuances, the short version is that we only accept a mathematical fact once it has a valid proof.

A year ago my experiments with LLMs for mathematics seemed futile. Logic, the guardrail keeping mathematics on track, was not LLMs’ strong suit. It would make basic errors all the time – making it impossible to trust any mathematical argument it produced. And mathematics is not like programming. A program which mostly works can still be useful. But a single weak link in a mathematical chain from A to B, and the entire connection is considered invalid.

What has changed in the last, I would say, two–three months is not that we suddenly can trust a purported proof straight from the LLM. But its accuracy and ability to find proofs have improved tremendously². Does this mean we can ask the LLM to create a proof of an open conjecture, wrap its output in a paper and leave it for the reviewers to check its validity? I fear this is happening as I am writing this, and it saddens my heart. Were it to become norm I should give up on mathematical research. But I do not think this is The Fate of Mathematics.

For while the slop dragon rears its ugly head, lo! a knight in her shining armor approaches. Who is she, she who will save Mathematics? I think she bears many names, but I foresee her armor to be formally verified. Because, while coming up with a proof is hard to the point of impossible in some cases, checking that a proof is correct is merely extremely tedious. And ever since the invention of computers, logicians have worked on systems for encoding mathematical proofs into a computer, and get the computer to verify their correctness. Systems like Agda, Rocq and L∃∀N are all mature formal verification tools, equipped to formalise modern mathematics. And they have been for years! But (with some notable exceptions) there has been neither the impetus nor knowledge among mathematicians to formally verify their results.

This has changed. Top mathematicians are talking about formalisation these days. Because, while we cannot trust LLMs to be correct, we can trust Agda, Rocq and Lean³. So, if the LLM can produce a computer-encoded proof approved by one of them, then we are, as they say, golden!

This human–machine, cyborg-knight of mathematics can take many forms. The most natural at the moment is a mathematician who finds a proof like she always has, by thinking hard, and an LLM helps her formalise it so that it can be machine-checked. This way, the mathematician needs much less training than before to produce a formalised proof, and our confidence in Mathematics simply increases. Then there is perhaps the more exciting form, where a mathematician envisions some results and works with the LLM to produce the machine-checked proof first. This is like the expert system architect developing the software of The Future with LLMs I mentioned earlier. The mathematician is in control, carefully guiding the formalisation, but can actually work faster than before, because the routine arguments, maybe even some of the novel ones, are being provided by LLMs. This is a double-win: not only are we producing more accurate mathematics, we are even producing it faster than we used to!

Let me end on a note of caution. Because, while a machine-checked proof may be correct, it is not always clear what it is exactly that has been correctly proven. The slop dragon can make a comeback by producing a machine-checked proof, but overstate in the surrounding article what has actually been proven. To verify that the article and formalisation correspond, one must have some grasp on both. Hence, formalised mathematics still needs to be verified in peer-review. A task that might require reviewers with a different skill-set than what reviewers of mathematical journals have today. 99.9999% of everything AI related is crap. But maybe, just maybe, The Future of Mathematics lies in the remaining 0.0001%.

I am not talking about the environmental impact of these models here, nor the dubious legality of their training. Not because I think these issues unimportant, but because I think they merit their own, different discussions. The environmental impact, especially, is a factor we must take into serious consideration.↩︎
Fable 5 seems to know even my obscure research in surprising (but not complete – mind you!) detail!↩︎
They do sometimes have soundness bugs, but I have yet to hear of a naturally occurring, incorrect proof being approved due to such a bug.↩︎

Expecting a comment section? Feel free to e-mail me your comments, or otherwise contact me to discuss the content of this site. See my contact info. You can also write your opinion on your own website, and link back here! ☺

June 2026

2026-06-10, Wednesday

∞-Categories at Mittag-Leffler

2026-06-11, Thursday