☆29Apr 30, 2024Updated last year
Alternatives and similar repositories for overthinking_the_truth
Users that are interested in overthinking_the_truth are comparing it to the libraries listed below
Sorting:
- ☆13Jul 2, 2025Updated 8 months ago
- ☆13Feb 25, 2025Updated last year
- MSP project: Latent Space Factorisation and Manipulation via Matrix Subspace Projection (ICML2020)☆14Dec 4, 2021Updated 4 years ago
- ☆52Oct 23, 2023Updated 2 years ago
- [EMNLP 2024] Official implementation of "Hierarchical Deconstruction of LLM Reasoning: A Graph-Based Framework for Analyzing Knowledge Ut…☆23Dec 4, 2024Updated last year
- Code for the arXiv preprint "The Unreasonable Effectiveness of Easy Training Data"☆48Jan 17, 2024Updated 2 years ago
- Code for the paper "A Mechanistic Interpretation of Arithmetic Reasoning in Language Models using Causal Mediation Analysis"☆19Jun 12, 2025Updated 8 months ago
- Self-Supervised Alignment with Mutual Information☆20May 24, 2024Updated last year
- How do transformer LMs encode relations?☆56Feb 24, 2024Updated 2 years ago
- Code for the paper "Pretraining task diversity and the emergence of non-Bayesian in-context learning for regression"☆25Jun 28, 2023Updated 2 years ago
- Neuron Activation☆26Nov 21, 2024Updated last year
- ☆28May 29, 2024Updated last year
- Data Valuation on In-Context Examples (ACL23)☆24Jan 12, 2025Updated last year
- Contrast is All You Need For High-Fidelity Text-to-Image Diffusion Models [CVPR 2024]☆25Oct 7, 2024Updated last year
- Augmenting Statistical Models with Natural Language Parameters☆29Sep 17, 2024Updated last year
- Easy-to-use MIRAGE code for faithful answer attribution in RAG applications. Paper: https://aclanthology.org/2024.emnlp-main.347/☆26Mar 10, 2025Updated 11 months ago
- This is the official repository for the "Towards Vision-Language Mechanistic Interpretability: A Causal Tracing Tool for BLIP" paper acce…☆25Feb 16, 2026Updated 2 weeks ago
- Open source replication of Anthropic's Crosscoders for Model Diffing☆64Oct 27, 2024Updated last year
- Sparse and discrete interpretability tool for neural networks☆64Feb 12, 2024Updated 2 years ago
- Exploring the Limitations of Large Language Models on Multi-Hop Queries☆32Mar 2, 2025Updated last year
- Code for the ACL-2022 paper "Knowledge Neurons in Pretrained Transformers"☆174May 4, 2024Updated last year
- ☆177Jul 24, 2024Updated last year
- ☆28Sep 21, 2024Updated last year
- Code for Paper (Policy Optimization in RLHF: The Impact of Out-of-preference Data)☆28Dec 19, 2023Updated 2 years ago
- Codebase for fine-tuning Llama2 70B to generate math test questions and answers.☆11Aug 30, 2024Updated last year
- Bias-to-Text: Debiasing Unknown Visual Biases through Language Interpretation☆32May 21, 2023Updated 2 years ago
- Code for Paper (Preserving Diversity in Supervised Fine-tuning of Large Language Models)☆52May 12, 2025Updated 9 months ago
- ☆35Sep 13, 2023Updated 2 years ago
- Integrating Graph Knowledge into End-to-End Task-Oriented Dialogue Systems☆29Apr 15, 2021Updated 4 years ago
- Locating and editing factual associations in GPT (NeurIPS 2022)☆730Apr 20, 2024Updated last year
- Q-Probe: A Lightweight Approach to Reward Maximization for Language Models☆40Jun 10, 2024Updated last year
- Utilities for the HuggingFace transformers library☆75Jan 21, 2023Updated 3 years ago
- ☆10Sep 29, 2023Updated 2 years ago
- ☆10Updated this week
- Codes and files for the paper Are Emergent Abilities in Large Language Models just In-Context Learning☆33Jan 9, 2025Updated last year
- ☆11May 25, 2023Updated 2 years ago
- Repo for paper "CODIS: Benchmarking Context-Dependent Visual Comprehension for Multimodal Large Language Models".☆12Oct 14, 2024Updated last year
- Concurrency library☆17Oct 13, 2024Updated last year
- ☆11Dec 23, 2024Updated last year