ICLR 2026 LIT Workshop Under review · EMNLP 2026

Mechanistic Evidence for Faithfulness Decay
in Chain-of-Thought Reasoning

Donald Ye,Max Loffgren, Om Kotadia, Linus Wong, and Jonas Rohweder

ICLR 2026 Workshop on LLM Interpretability & Transparency (LIT) · arXiv:2602.11201

arXiv PDF Code

Abstract

Chain-of-thought (CoT) prompting is widely assumed to elicit faithful reasoning in large language models — yet whether intermediate steps causally drive the final answer remains poorly understood. We introduce Normalized Logit Difference Decay (NLDD), a metric that quantifies the causal influence of each reasoning step on the final prediction via targeted corruptions. Across Gemma-2 and related models, we identify a Reasoning Horizon k* at 70–85% of chain length: before k*, CoT steps are causally active; after k*, reasoning collapses into post-hoc rationalization. We further document an anti-faithful regime in Gemma-2 where later steps actively suppress correct reasoning.

Key Findings

70–85%

Reasoning Horizon k*

CoT steps before k* causally drive the answer. After k*, the model is rationalizing, not reasoning.

NLDD

New Metric

Normalized Logit Difference Decay — measures causal faithfulness at each step via targeted token corruptions.

Anti-faith

Gemma-2 Anomaly

Later CoT steps in Gemma-2 actively suppress correct reasoning — a distinct failure mode not seen in other models.

NLDD framework showing transformer intervention and reasoning horizon

Figure 1. Left: the causal intervention framework — corrupting step k and measuring logit shift. Right: the Reasoning Horizon curve, where NLDD peaks at k* and decays post-horizon.

Citation

BibTeX

@inproceedings{ ye2026mechanistic, title={Mechanistic Evidence for Faithfulness Decay in Chain-of-Thought Reasoning}, author={Donald Ye and Max Loffgren and Om Kotadia and Linus Wong and Jonas Rohweder}, booktitle={Workshop on Latent {\&} Implicit Thinking {\textendash} Going Beyond CoT Reasoning}, year={2026}, url={https://openreview.net/forum?id=wVj7dB7waI} }

The Gradient-Causal Gap

View project ↗

Mechanistic Evidence for Faithfulness Decayin Chain-of-Thought Reasoning

Mechanistic Evidence for Faithfulness Decay
in Chain-of-Thought Reasoning