Exploring the Limitations of Large Language Models on Multi-Hop Queries
☆32Mar 2, 2025Updated last year
Alternatives and similar repositories for HoppingTooLate
Users that are interested in HoppingTooLate are comparing it to the libraries listed below
Sorting:
- ☆13Oct 5, 2025Updated 5 months ago
- ☆70Mar 6, 2025Updated last year
- ☆10Nov 6, 2024Updated last year
- Materials for "Multi-property Steering of Large Language Models with Dynamic Activation Composition"☆14Nov 22, 2024Updated last year
- Confidence Regulation Neurons in Language Models (NeurIPS 2024)☆15Feb 1, 2025Updated last year
- Attribution-based Parameter Decomposition☆34Jun 11, 2025Updated 8 months ago
- CS194-196 Course Project☆14Feb 20, 2025Updated last year
- A benchmark for mechanistic discovery of circuits in Transformers☆16Dec 15, 2024Updated last year
- A Large-Scale Gender Bias Dataset for Coreference Resolution and Machine Translation, Levy et al., Findings of EMNLP 2021☆14Apr 3, 2022Updated 3 years ago
- ☆18Nov 5, 2025Updated 4 months ago
- Code for Evaluating Explanations for Reading Comprehension with Realistic Counterfactuals.☆18Apr 25, 2021Updated 4 years ago
- [EMNLP 2024] Official implementation of "Hierarchical Deconstruction of LLM Reasoning: A Graph-Based Framework for Analyzing Knowledge Ut…☆23Dec 4, 2024Updated last year
- Code for our paper "Decomposing The Dark Matter of Sparse Autoencoders"☆23Feb 6, 2025Updated last year
- Evaluate interpretability methods on localizing and disentangling concepts in LLMs.☆57Oct 30, 2025Updated 4 months ago
- ☆41Jun 11, 2025Updated 8 months ago
- Exploring Few-Shot Adaptation of Language Models with Tables☆24Aug 22, 2022Updated 3 years ago
- Code to enable layer-level steering in LLMs using sparse auto encoders☆31Sep 18, 2025Updated 5 months ago
- Open source replication of Anthropic's Crosscoders for Model Diffing☆64Oct 27, 2024Updated last year
- ☆70Jun 18, 2025Updated 8 months ago
- [NAACL'25 Oral] Steering Knowledge Selection Behaviours in LLMs via SAE-Based Representation Engineering☆76Jan 16, 2026Updated last month
- The official repository containing the source code to the explAIner publication.☆32Apr 29, 2024Updated last year
- This repository collects all relevant resources about interpretability in LLMs☆390Nov 1, 2024Updated last year
- Course Materials for Interpretability of Large Language Models (0368.4264) at Tel Aviv University☆307Feb 8, 2026Updated last month
- A library for efficient patching and automatic circuit discovery.☆91Dec 31, 2025Updated 2 months ago
- Methods and evaluation for aligning language models temporally☆30Mar 2, 2024Updated 2 years ago
- Measuring the situational awareness of language models☆40Feb 12, 2024Updated 2 years ago
- A Human-LLM Collaborative Dataset for Generative Information-seeking with Attribution☆35Aug 2, 2023Updated 2 years ago
- ☆84Feb 25, 2025Updated last year
- Code for NeurIPS'24 paper 'Grokked Transformers are Implicit Reasoners: A Mechanistic Journey to the Edge of Generalization'☆234Jul 19, 2025Updated 7 months ago
- Build an AI bot in Discord to serve user's personalized reports on what's up in tech☆28Sep 14, 2025Updated 5 months ago
- 🪝PISCES - Precise In-Parameter Suppression for Concept EraSure in Large Language Models☆12May 30, 2025Updated 9 months ago
- Training code for Sparse Autoencoders on Embedding models☆39Feb 27, 2025Updated last year
- [NeurIPS 2024] Knowledge Circuits in Pretrained Transformers☆164Nov 14, 2025Updated 3 months ago
- Code for experiments on self-prediction as a way to measure introspection in LLMs☆16Dec 10, 2024Updated last year
- ☆12Jul 8, 2024Updated last year
- Token-free Language Modeling with ByGPT5 & Friends!☆12Jul 18, 2025Updated 7 months ago
- Evaluation Pipeline for medical tasks.☆12Feb 13, 2026Updated 3 weeks ago
- A framework for evaluating Machine Translation models.☆12May 26, 2025Updated 9 months ago
- Reference implementation of Thin and Deep Gaussian Processes (NeurIPS 2023)☆14Nov 25, 2024Updated last year