原创 WDCD Run #100: Average Instruction Decay Hits 39.1% Across 11 Models, Claude Opus 4.7 Leads WDCD Run #100 (2026-05-03) tested 11 frontier models on multi-turn commitment integrity, recording an average instructio WDCD AI benchmark instruction decay multi-turn 2天前 108