justinchiu / openlogprobs
Extract full next-token probabilities via language model APIs
☆230Updated 10 months ago
Alternatives and similar repositories for openlogprobs:
Users that are interested in openlogprobs are comparing it to the libraries listed below
- Create feature-centric and prompt-centric visualizations for sparse autoencoders (like those from Anthropic's published research).☆176Updated last month
- Sparse autoencoders☆407Updated this week
- ☆115Updated this week
- ☆201Updated 3 months ago
- ☆114Updated last year
- Mechanistic Interpretability Visualizations using React☆219Updated 3 weeks ago
- ☆135Updated this week
- Steering vectors for transformer language models in Pytorch / Huggingface☆78Updated last month
- Erasing concepts from neural representations with provable guarantees☆219Updated last month
- ☆106Updated 5 months ago
- Improving Alignment and Robustness with Circuit Breakers☆174Updated 3 months ago
- Code to reproduce "Transformers Can Do Arithmetic with the Right Embeddings", McLeish et al (NeurIPS 2024)☆182Updated 7 months ago
- Functional Benchmarks and the Reasoning Gap☆82Updated 3 months ago
- A toolkit for describing model features and intervening on those features to steer behavior.☆149Updated 2 months ago
- Keeping language models honest by directly eliciting knowledge encoded in their activations.☆192Updated this week
- Code for NeurIPS'24 paper 'Grokked Transformers are Implicit Reasoners: A Mechanistic Journey to the Edge of Generalization'☆175Updated last month
- RuLES: a benchmark for evaluating rule-following in language models☆215Updated this week
- ☆247Updated 6 months ago
- ☆25Updated 9 months ago
- A MAD laboratory to improve AI architecture designs 🧪☆102Updated last month
- nanoGPT-like codebase for LLM training☆83Updated this week
- ☆135Updated 3 months ago
- Understand and test language model architectures on synthetic tasks.☆175Updated this week
- PyTorch library for Active Fine-Tuning☆52Updated last week
- Code for the paper "Fishing for Magikarp"☆139Updated this week
- Textbook on reinforcement learning from human feedback☆111Updated this week
- The simplest, fastest repository for training/finetuning medium-sized GPTs.☆90Updated last month
- Using sparse coding to find distributed representations used by neural networks.☆207Updated last year
- Archon provides a modular framework for combining different inference-time techniques and LMs with just a JSON config file.☆154Updated 2 months ago
- ☆75Updated 5 months ago