What benchmarks miss about LLMs
Benchmarks do not tell the whole story.
Writing about code, tools, and the web.
Benchmarks do not tell the whole story.
Kimi K2.5 Turbo and fast iteration. Why I prefer speed over waiting for big models.
What works, what worries me, and the main rules I follow.
Making pointless things with code is really cool.