Repository for "Training Language Models To Explain Their Own Computations"
☆21Dec 22, 2025Updated 2 months ago
Alternatives and similar repositories for introspective-interp
Users that are interested in introspective-interp are comparing it to the libraries listed below
Sorting:
- ☆17Aug 30, 2025Updated 6 months ago
- 批量下载北京大学教学网课件☆12Apr 8, 2023Updated 2 years ago
- Code for the paper "Refining Language Model with Compositional Explanation" (NeurIPS 2021)☆12Oct 25, 2021Updated 4 years ago
- ☆12Sep 6, 2024Updated last year
- ☆13Jul 26, 2023Updated 2 years ago
- ☆26Jun 12, 2023Updated 2 years ago
- LLM benchmarks☆13Feb 22, 2024Updated 2 years ago
- VertMetric: An abstractive summarization evaluation package. VERT stands for Versatile Evaluation of Reduced Texts.☆11Dec 20, 2018Updated 7 years ago
- Competition of Mechanisms: Tracing How Language Models Handle Facts and Counterfactuals☆12May 24, 2024Updated last year
- HealthFC: Verifying Health Claims with Evidence-Based Medical Fact-Checking☆12Apr 11, 2025Updated 11 months ago
- Convenient Course Query Website☆20Sep 11, 2024Updated last year
- ☆75Updated this week
- 🪝PISCES - Precise In-Parameter Suppression for Concept EraSure in Large Language Models☆12May 30, 2025Updated 9 months ago
- A Node.Js / Neo4J tool that translates words and relations into network graphs and shows you how it all connects.☆11Oct 24, 2019Updated 6 years ago
- ☆11Mar 9, 2025Updated last year
- ☆19Sep 16, 2025Updated 6 months ago
- A library for training crosscoders☆16May 28, 2025Updated 9 months ago
- Code and materials for "Weird Generalization and Inductive Backdoors"☆36Jan 11, 2026Updated 2 months ago
- A Blackjack game with GUI written in Java.☆11Nov 21, 2018Updated 7 years ago
- Code for Evaluating Explanations for Reading Comprehension with Realistic Counterfactuals.☆18Apr 25, 2021Updated 4 years ago
- Code for the "Overcoming Sparsity Artifacts in Crosscoders to Interpret Chat-Tuning" paper.☆16Nov 21, 2025Updated 3 months ago
- Code for the paper "REV: Information-Theoretic Evaluation of Free-Text Rationales"☆16Aug 11, 2023Updated 2 years ago
- Official code implementation for the paper "Do Vision & Language Decoders use Images and Text equally? How Self-consistent are their Expl…☆12Apr 4, 2025Updated 11 months ago
- Converts Quora's new NLU dataset to SNLI txt/jsonl format, plus test/dev split, tokenization.☆14Jan 27, 2017Updated 9 years ago
- Manage ML configuration with pydantic☆16Updated this week
- Fast Axiomatic Attribution for Neural Networks (NeurIPS*2021)☆15Feb 24, 2026Updated 3 weeks ago
- CSCW 2023 Best Demo Award: Conversational AI Explanations to Support Human-AI Scientific Writing☆14Jun 25, 2023Updated 2 years ago
- ☆18Apr 16, 2021Updated 4 years ago
- Slop Scoring to Stop Slop☆56Updated this week
- [ACL 2024] Do Large Language Models Latently Perform Multi-Hop Reasoning?☆90Mar 18, 2025Updated last year
- Implementations of several self-supervised pretext tasks for language and vision modalities in PyTorch.☆13Jan 19, 2021Updated 5 years ago
- The code of the paper "DivScene: Benchmarking LVLMs for Object Navigation with Diverse Scenes and Objects"☆20May 2, 2025Updated 10 months ago
- Find informative examples to efficiently (human)-evaluate NLG models.☆18Feb 27, 2026Updated 3 weeks ago
- Tokenize and clean strings in Python☆11Jan 11, 2018Updated 8 years ago
- ☆18Oct 6, 2022Updated 3 years ago
- Implementation of the paper: "Turning Tables: Generating Examples from Semi-structured Tables for Endowing Language Models with Reasoning…☆22Nov 2, 2021Updated 4 years ago
- Official Implementation of "DeCoRe: Decoding by Contrasting Retrieval Heads to Mitigate Hallucination"☆28Dec 18, 2024Updated last year
- Testing paligemma2 finetuning on reasoning dataset☆18Dec 28, 2024Updated last year
- [ACL'24 Oral] Analysing The Impact of Sequence Composition on Language Model Pre-Training☆23Aug 18, 2024Updated last year