Chain-of-thought (CoT) prompting is widely assumed to elicit faithful reasoning in large language models — yet whether intermediate steps causally drive the final answer remains poorly understood. We introduce Normalized Logit Difference Decay (NLDD), a metric that quantifies the causal influence of each reasoning step on the final prediction via targeted corruptions. Across Gemma-2 and related models, we identify a Reasoning Horizon k* at 70–85% of chain length: before k*, CoT steps are causally active; after k*, reasoning collapses into post-hoc rationalization. We further document an anti-faithful regime in Gemma-2 where later steps actively suppress correct reasoning.
Figure 1. Left: the causal intervention framework — corrupting step k and measuring logit shift. Right: the Reasoning Horizon curve, where NLDD peaks at k* and decays post-horizon.
BibTeX