My thoughts on GPT 5.

Ever since GPT-5 launched last month, I've been testing it extensively, fascinated to find out how much higher the ceiling was for OpenAI, especially in such a hostile, fast moving market.

I tried, I observed, I iterated. But I also noticed something.

No matter how hard I tried, I could not pin down an opinion on this model, because it feels good, bad, but also like nothing we've ever had before - all at once.

And I think after a few weeks of pushing it, and really testing it in senses that would have been highly unconventional before, I think it simply isn't fair to classify GPT 5 under traditional benchmarks and scores. It's the start of something much bigger.

Model benchmarks, and why they can't paint the whole picture.

Until today, we would define state of the art models by running them through a set of incredibly difficult and intense tests, such as Humanity's Last Exam, as well as task-specific tests, like SWE-bench for software engineering.

These benchmarks are incredibly powerful for giving us an idea of how a model can perform when problem solving, or reasoning. And as intelligence has advanced, so did the tests we run them against.

GPT-5's scores fluctuate wildly across traditional benchmarks - it is simultaneously fantastic, and nothing special and everything in between. Why is this possible, and what makes this work?

The answer is tied to something entirely new:

Agnostic reasoning.

GPT-5 is the first officially agnostic AI release on the market. The idea behind agnostic AI is that it can assign an amount of effort to place into reasoning on a prompt, and use that to balance speed and reason.

No shift as large as this is going to be perfect, and it requires a lot of re learning once again, especially if you've been used to prompting the o series of models.

However, this shift in reasoning power has created what I think is the real secret sauce, and something that I believe is the thing that makes this release one that blurs the line between a release, and a paradigm shift.

The model's methods of expression.

GPT-5 feels more human than ever. I've found myself asking more questions to it in a very different way than I used to, simply because it feels more trustworthy. It presents more cautions now, is more open about its own operations and limitations.

It may feel dumber on the surface, but that lower intelligence almost feels like lowering the LLM from something that perceives itself as an almighty being, and brings it more in line with a companion.

My theory is OpenAI have noticed the shift in how users want to talk to their chatbots - they know that memory and a personalized, grounded experience is what users are looking for, and this is absolutely the direction that OpenAI knows they should head in.

However, this introduces its own problem.

Let's talk safety.

With added emotion, and added personality comes more data, and more responsibility. AI psychosis and dangers around overconfidence and hallucinations are becoming more of a threat than ever before, and with that, comes a greater responsibility to ride that fine line between growth and encouragement, and delusion.

GPT 5 does claim to hallucinate less, and that does seem to be true, however it does seem to be better at admitting when it doesn't know something,

However, even when I instruct it, it still feels like it glazes me in a way that makes me question it. I do fear it will create a new level of imposter syndrome - either a reliance on CHatGPT, which I have already seen with my own eyes, or a mistrust in everything.

Final words

Overall, GPT 5 is.. confusing. It feels like it's both the best frontier we've ever had, and the worst at the same time. It feels more human than ever, but also more intimidating than ever.

This is a new frontier, and one that's impacts will be felt, even if they are near impossible to describe.

And yet, this is still just the beginning.