google-deepmind / bbeh
☆66Updated last month
Alternatives and similar repositories for bbeh:
Users that are interested in bbeh are comparing it to the libraries listed below
- A Large-Scale, High-Quality Math Dataset for Reinforcement Learning in Language Models☆47Updated 2 months ago
- Search, Verify and Feedback: Towards Next Generation Post-training Paradigm of Foundation Models via Verifier Engineering☆57Updated 4 months ago
- Code for Paper: Teaching Language Models to Critique via Reinforcement Learning☆94Updated last week
- Code for "Reasoning to Learn from Latent Thoughts"☆91Updated 3 weeks ago
- ☆46Updated last month
- A dataset of LLM-generated chain-of-thought steps annotated with mistake location.☆80Updated 8 months ago
- Exploration of automated dataset selection approaches at large scales.☆39Updated last month
- Advancing Language Model Reasoning through Reinforcement Learning and Inference Scaling☆101Updated 3 months ago
- Code and data used in the paper: "Training on Incorrect Synthetic Data via RL Scales LLM Math Reasoning Eight-Fold"☆30Updated 10 months ago
- Code for ICLR 2025 Paper "What is Wrong with Perplexity for Long-context Language Modeling?"☆53Updated 3 weeks ago
- Official github repo for the paper "Compression Represents Intelligence Linearly" [COLM 2024]☆132Updated 7 months ago
- This is the official implementation of the paper "S²R: Teaching LLMs to Self-verify and Self-correct via Reinforcement Learning"☆62Updated this week
- The official repository of the Omni-MATH benchmark.☆80Updated 4 months ago
- General Reasoner: Advancing LLM Reasoning Across All Domains☆60Updated last week
- [NeurIPS-2024] 📈 Scaling Laws with Vocabulary: Larger Models Deserve Larger Vocabularies https://arxiv.org/abs/2407.13623☆83Updated 7 months ago
- ☆59Updated 7 months ago
- [ICLR'24 spotlight] Tool-Augmented Reward Modeling☆47Updated 3 months ago
- official implementation of paper "Process Reward Model with Q-value Rankings"☆56Updated 2 months ago
- B-STAR: Monitoring and Balancing Exploration and Exploitation in Self-Taught Reasoners☆79Updated 3 weeks ago
- ☆50Updated 2 months ago
- Official repository for paper "Weak-to-Strong Extrapolation Expedites Alignment"☆74Updated 10 months ago
- Codebase for Instruction Following without Instruction Tuning☆34Updated 7 months ago
- Replicating O1 inference-time scaling laws☆83Updated 4 months ago
- Easy-to-Hard Generalization: Scalable Alignment Beyond Human Supervision☆120Updated 7 months ago
- [AAAI 2025 oral] Evaluating Mathematical Reasoning Beyond Accuracy☆60Updated 4 months ago
- This is an official implementation of the Reward rAnked Fine-Tuning Algorithm (RAFT), also known as iterative best-of-n fine-tuning or re…☆29Updated 7 months ago
- Long Context Extension and Generalization in LLMs☆53Updated 7 months ago
- ☆57Updated last month
- Code for "Critique Fine-Tuning: Learning to Critique is More Effective than Learning to Imitate"☆139Updated this week
- ☆43Updated 8 months ago