csinva / interpretable-embeddingsLinks
Interpretable text embeddings by asking LLMs yes/no questions (NeurIPS 2024)
☆45Updated 11 months ago
Alternatives and similar repositories for interpretable-embeddings
Users that are interested in interpretable-embeddings are comparing it to the libraries listed below
Sorting:
- ☆184Updated last year
 - ☆108Updated 8 months ago
 - The Prism Alignment Project☆83Updated last year
 - We develop benchmarks and analysis tools to evaluate the causal reasoning abilities of LLMs.☆130Updated last year
 - ☆98Updated 2 years ago
 - A library for efficient patching and automatic circuit discovery.☆78Updated 3 months ago
 - Experiments and code to generate the GINC small-scale in-context learning dataset from "An Explanation for In-context Learning as Implici…☆106Updated last year
 - Exploring the Limitations of Large Language Models on Multi-Hop Queries☆27Updated 8 months ago
 - ☆103Updated last year
 - Data and code for the Corr2Cause paper (ICLR 2024)☆111Updated last year
 - ☆128Updated last year
 - Function Vectors in Large Language Models (ICLR 2024)☆181Updated 6 months ago
 - ☆92Updated last year
 - ☆241Updated last year
 - Inspecting and Editing Knowledge Representations in Language Models☆119Updated 2 years ago
 - The accompanying code for "Transformer Feed-Forward Layers Are Key-Value Memories". Mor Geva, Roei Schuster, Jonathan Berant, and Omer Le…☆97Updated 4 years ago
 - PASTA: Post-hoc Attention Steering for LLMs☆126Updated 11 months ago
 - Code release for "Debating with More Persuasive LLMs Leads to More Truthful Answers"☆118Updated last year
 - ☆23Updated 9 months ago
 - ☆111Updated 8 months ago
 - Easy-to-Hard Generalization: Scalable Alignment Beyond Human Supervision☆124Updated last year
 - Stanford NLP Python library for benchmarking the utility of LLM interpretability methods☆136Updated 4 months ago
 - Delphi was the home of a temple to Phoebus Apollo, which famously had the inscription, 'Know Thyself.' This library lets language models …☆221Updated last week
 - Open source replication of Anthropic's Crosscoders for Model Diffing☆59Updated last year
 - ☆164Updated 11 months ago
 - ☆33Updated last year
 - Lightweight Adapting for Black-Box Large Language Models☆23Updated last year
 - ☆23Updated last year
 - ☆37Updated last year
 - ☆183Updated 11 months ago