trestad / Factual-Recall-MechanismLinks

The code for paper Interpreting Key Mechanisms of Factual Recall in Transformer-Based Language Models.

☆13

Alternatives and similar repositories for Factual-Recall-Mechanism

Users that are interested in Factual-Recall-Mechanism are comparing it to the libraries listed below

Sorting:

trestad / mitigating-reversal-curse
Code for paper 'Are We Falling in a Middle-Intelligence Trap? An Analysis and Mitigation of the Reversal Curse'
☆13Updated 10 months ago
horseee / CoT-Valve
CoT-Valve: Length-Compressible Chain-of-Thought Tuning
☆73Updated 4 months ago
GATECH-EIC / ACT
[ICML 2024] Unveiling and Harnessing Hidden Attention Sinks: Enhancing Large Language Models without Training through Attention Calibrati…
☆41Updated 11 months ago
hkust-nlp / PEM_composition
[NeurIPS 2023] Github repository for "Composing Parameter-Efficient Modules with Arithmetic Operations"
☆61Updated last year
alisawuffles / proxy-tuning
Code associated with Tuning Language Models by Proxy (Liu et al., 2024)
☆112Updated last year
hkust-nlp / Laser
Laser: Learn to Reason Efficiently with Adaptive Length-based Reward Shaping
☆48Updated last month
MingyuJ666 / Rope_with_LLM
[ICML'25] Our study systematically investigates massive values in LLMs' attention mechanisms. First, we observe massive values are concen…
☆73Updated last week
OpenSparseLLMs / LLaMA-MoE-v2
🚀 LLaMA-MoE v2: Exploring Sparsity of LLaMA from Perspective of Mixture-of-Experts with Post-Training
☆86Updated 6 months ago
RZFan525 / Awesome-ScalingLaws
A curated list of awesome resources dedicated to Scaling Laws for LLMs
☆72Updated 2 years ago
Lingkai-Kong / RE-Control
Code for paper: Aligning Large Language Models with Representation Editing: A Control Perspective
☆32Updated 4 months ago
zhyang2226 / AR-Lopti
[arXiv] Do Not Let Low-Probability Tokens Over-Dominate in RL for LLMs
☆32Updated last month
RUCBM / DeepCritic
Official repository for paper "DeepCritic: Deliberate Critique with Large Language Models"
☆30Updated last month
Joshua-Ren / Learning_dynamics_LLM
☆139Updated last month
hahahawu / Long-to-Short-via-Model-Merging
Model merging is a highly efficient approach for long-to-short reasoning.
☆65Updated 3 weeks ago
xuyige / SoftCoT
ACL'2025: SoftCoT: Soft Chain-of-Thought for Efficient Reasoning with LLMs. and preprint: SoftCoT++: Test-Time Scaling with Soft Chain-of…
☆28Updated 3 weeks ago
WANGXinyiLinda / planning_tokens
Official code for Guiding Language Model Math Reasoning with Planning Tokens
☆13Updated last year
VITA-Group / SEAL
Official code for SEAL: Steerable Reasoning Calibration of Large Language Models for Free
☆28Updated 2 months ago
jianghoucheng / AlphaEdit
AlphaEdit: Null-Space Constrained Knowledge Editing for Language Models, ICLR 2025 (Outstanding Paper)
☆271Updated last week
xydaytoy / EVA
☆13Updated last year
bethgelab / sober-reasoning
A Sober Look at Language Model Reasoning
☆74Updated last week
hanxuhu / SeqIns
The repository of the project "Fine-tuning Large Language Models with Sequential Instructions", code base comes from open-instruct and LA…
☆29Updated 7 months ago
hemingkx / TokenSkip
TokenSkip: Controllable Chain-of-Thought Compression in LLMs
☆156Updated 3 months ago
sail-sg / Attention-Sink
[ICLR 2025] When Attention Sink Emerges in Language Models: An Empirical View (Spotlight)
☆88Updated 8 months ago
edenbiran / RippleEdits
Evaluating the Ripple Effects of Knowledge Editing in Language Models
☆55Updated last year
Furyton / awesome-language-model-analysis
This paper list focuses on the theoretical and empirical analysis of language models, especially large language models (LLMs). The papers…
☆85Updated 6 months ago
TsinghuaC3I / MARTI
A Framework for LLM-based Multi-Agent Reinforced Training and Inference
☆140Updated 2 weeks ago
kkk-an / UltraIF
Code of paper 'UltraIF: Advancing Instruction Following from the Wild'.
☆17Updated 2 months ago
haotiansun14 / BBox-Adapter
Lightweight Adapting for Black-Box Large Language Models
☆22Updated last year
KbsdJames / omni-math-rule
The rule-based evaluation subset and code implementation of Omni-MATH
☆22Updated 6 months ago
abhishekpanigrahi1996 / Skill-Localization-by-grafting
☆49Updated last year