csinva / interpretable-embeddings
Interpretable text embeddings by asking LLMs yes/no questions (NeurIPS 2024)
☆35Updated 3 months ago
Alternatives and similar repositories for interpretable-embeddings:
Users that are interested in interpretable-embeddings are comparing it to the libraries listed below
- ☆86Updated last week
- ☆171Updated last year
- ☆44Updated 6 months ago
- Easy-to-Hard Generalization: Scalable Alignment Beyond Human Supervision☆115Updated 5 months ago
- ☆80Updated 11 months ago
- ☆89Updated last year
- [NeurIPS 2024] Code for the paper "Diffusion of Thoughts: Chain-of-Thought Reasoning in Diffusion Language Models"☆109Updated 11 months ago
- Universal Neurons in GPT2 Language Models☆27Updated 8 months ago
- Code and data for the paper "Understanding Hidden Context in Preference Learning: Consequences for RLHF"☆28Updated last year
- Experiments and code to generate the GINC small-scale in-context learning dataset from "An Explanation for In-context Learning as Implici…☆103Updated last year
- ☆79Updated 7 months ago
- Lightweight Adapting for Black-Box Large Language Models☆19Updated last year
- A library for efficient patching and automatic circuit discovery.☆53Updated this week
- ☆26Updated last year
- Offical code of the paper Large Language Models Are Implicitly Topic Models: Explaining and Finding Good Demonstrations for In-Context Le…☆71Updated 11 months ago
- ☆21Updated 4 months ago
- Code for the paper "VinePPO: Unlocking RL Potential For LLM Reasoning Through Refined Credit Assignment"☆121Updated 3 months ago
- Online Adaptation of Language Models with a Memory of Amortized Contexts (NeurIPS 2024)☆61Updated 6 months ago
- Open source replication of Anthropic's Crosscoders for Model Diffing☆39Updated 3 months ago
- Efficient empirical NTKs in PyTorch☆18Updated 2 years ago
- ☆76Updated 6 months ago
- ☆95Updated 7 months ago
- ☆60Updated 2 years ago
- Data and code for the Corr2Cause paper (ICLR 2024)☆93Updated 10 months ago
- ☆83Updated 7 months ago
- ☆22Updated 6 months ago
- Function Vectors in Large Language Models (ICLR 2024)☆138Updated 4 months ago
- ☆109Updated 6 months ago
- ☆78Updated last year