lee-ny / teaching_arithmetic
☆79Updated last year
Alternatives and similar repositories for teaching_arithmetic:
Users that are interested in teaching_arithmetic are comparing it to the libraries listed below
- Language models scale reliably with over-training and on downstream tasks☆96Updated last year
- ☆34Updated last year
- Code release for "Debating with More Persuasive LLMs Leads to More Truthful Answers"☆103Updated last year
- [NeurIPS'24 Spotlight] Observational Scaling Laws☆54Updated 6 months ago
- Official repository for our paper, Transformers Learn Higher-Order Optimization Methods for In-Context Learning: A Study with Linear Mode…☆16Updated 5 months ago
- ☆93Updated last year
- ☆96Updated 9 months ago
- ☆51Updated 11 months ago
- ☆175Updated last year
- Code for the paper "VinePPO: Unlocking RL Potential For LLM Reasoning Through Refined Credit Assignment"☆151Updated 5 months ago
- Code for the paper "The Impact of Positional Encoding on Length Generalization in Transformers", NeurIPS 2023☆135Updated 11 months ago
- ☆90Updated 9 months ago
- ☆83Updated last year
- ☆91Updated 2 months ago
- A library for efficient patching and automatic circuit discovery.☆63Updated this week
- Replicating O1 inference-time scaling laws☆83Updated 4 months ago
- Repository for NPHardEval, a quantified-dynamic benchmark of LLMs☆53Updated last year
- Code and Configs for Asynchronous RLHF: Faster and More Efficient RL for Language Models☆45Updated 3 weeks ago
- Easy-to-Hard Generalization: Scalable Alignment Beyond Human Supervision☆120Updated 7 months ago
- Personalized Soups: Personalized Large Language Model Alignment via Post-hoc Parameter Merging☆99Updated last year
- ☆40Updated last year
- Simple and efficient pytorch-native transformer training and inference (batched)☆73Updated last year
- ☆82Updated 8 months ago
- Function Vectors in Large Language Models (ICLR 2024)☆162Updated last week
- ☆25Updated last year
- Understand and test language model architectures on synthetic tasks.☆192Updated last month
- AI Logging for Interpretability and Explainability🔬☆111Updated 10 months ago
- The accompanying code for "Transformer Feed-Forward Layers Are Key-Value Memories". Mor Geva, Roei Schuster, Jonathan Berant, and Omer Le…☆90Updated 3 years ago
- ☆114Updated 8 months ago
- ☆104Updated 5 months ago