#AI benchmarks

Tutorials, deep dives and product notes — built for developers.

Hy3 vs DeepSeek V4 Pro: Open-Weight Showdown — Tencent's Dark Horse Edges Out DeepSeek

Tencent's 295B MoE Hy3 just took the fight to DeepSeek's 1.6T V4 Pro — and won on 12 of 18 shared benchmarks. Pricing is close: Hy3 cheaper on fresh input/output, V4 Pro's disk caching is 16.5× cheaper on repeated contexts. Full breakdown.

Jul 7, 2026 · 2.4K views · Abdeladim Fadheli

The AI Coding Revolution: Tracking 14 Months of Benchmark Progress (March 2024 – May 2026)

From 33.4% Verified to 93.9% — Fable 5 breaks 90%. GPT-5.5's 47-day Terminal-Bench reign ends. Track 27 months of AI coding progress with new charts. Updated June 9, 2026.

Jun 1, 2026 · 428 views · Abdeladim Fadheli

The Context Window Lie: How Well AI Models Actually Use 1M Tokens in 2026

Every AI model claims a 1M-token context window. But only GPT-5.5 and Claude Opus 4.6 actually use it. We analyzed MRCR v2, NIAH-2, and Graphwalks to show the 60-point gap between the best and worst "1M-capable" models — and which one to trust for long-context coding.

May 29, 2026 · 3.2K views · Abdeladim Fadheli