justinchiu / openlogprobs
Extract full next-token probabilities via language model APIs
☆229Updated 9 months ago
Related projects ⓘ
Alternatives and complementary repositories for openlogprobs
- Create feature-centric and prompt-centric visualizations for sparse autoencoders (like those from Anthropic's published research).☆158Updated last month
- Steering vectors for transformer language models in Pytorch / Huggingface☆65Updated this week
- Code to reproduce "Transformers Can Do Arithmetic with the Right Embeddings", McLeish et al (NeurIPS 2024)☆180Updated 5 months ago
- Mechanistic Interpretability Visualizations using React☆200Updated 4 months ago
- Sparse autoencoders☆345Updated this week
- Erasing concepts from neural representations with provable guarantees☆210Updated last week
- ☆109Updated this week
- Training Sparse Autoencoders on Language Models☆481Updated this week
- ☆106Updated last month
- ☆101Updated 3 months ago
- ☆108Updated last year
- ☆188Updated last month
- code for training & evaluating Contextual Document Embedding models☆119Updated this week
- ☆24Updated 7 months ago
- Improving Alignment and Robustness with Circuit Breakers☆154Updated last month
- A puzzle to learn about prompting☆122Updated last year
- ☆44Updated this week
- ☆99Updated 3 months ago
- Tools for understanding how transformer predictions are built layer-by-layer☆432Updated 5 months ago
- Functional Benchmarks and the Reasoning Gap☆78Updated last month
- A toolkit for describing model features and intervening on those features to steer behavior.☆107Updated 2 weeks ago
- Experiments for efforts to train a new and improved t5☆76Updated 7 months ago
- Multipack distributed sampler for fast padding-free training of LLMs☆178Updated 3 months ago
- Code for NeurIPS'24 paper 'Grokked Transformers are Implicit Reasoners: A Mechanistic Journey to the Edge of Generalization'☆162Updated last month
- Understand and test language model architectures on synthetic tasks.☆163Updated 6 months ago
- A MAD laboratory to improve AI architecture designs 🧪☆95Updated 6 months ago
- RuLES: a benchmark for evaluating rule-following in language models☆211Updated last month
- ☆240Updated 4 months ago
- ☆91Updated last year
- Steering Llama 2 with Contrastive Activation Addition☆98Updated 6 months ago