Tutorials, deep dives and product notes — built for developers.
GLM-5.2 vs GLM-5.1: the full sibling comparison. DeepSWE +28.2 (18.0→46.2), HMMT +9.9, GPQA +5.0, Pro +3.7. 200K→1M context (5×). Single→dual thinking modes. Anthropic API native. Same MIT license, same $4.40/1M. All data from Z.ai official blog.
GLM-5.2 (62.1% Pro, MIT open-weight, $4.40/1M) beats GPT-5.5 (58.6%, $30/1M) on SWE-bench Pro by 3.5 points at 1/7 the cost. Also leads HLE w/tools (+2.5), FrontierSWE (+1.8), MCP Atlas (+1.7). GPT-5.5 counters with DeepSWE (+23.8), TB 2.1 (+3.0). Full comparison with 12 shared benchmarks from Z.AI/VentureBeat data.
What SWE-bench Pro actually measures, how it works (1,865 tasks, 41 repos, 123 languages), why OpenAI abandoned SWE-bench Verified, the DeepSWE audit that found 32% verifier errors, and how to use coding benchmarks correctly. The definitive explainer.