Easy-to-Hard Generalization: Scalable Alignment Beyond Human Supervision
β124Sep 9, 2024Updated last year
Alternatives and similar repositories for easy-to-hard
Users that are interested in easy-to-hard are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Simple and efficient pytorch-native transformer training and inference (batched)β79Apr 2, 2024Updated last year
- [NeurIPS'24] Official code for *π―DART-Math: Difficulty-Aware Rejection Tuning for Mathematical Problem-Solving*β121Dec 10, 2024Updated last year
- B-STAR: Monitoring and Balancing Exploration and Exploitation in Self-Taught Reasonersβ86May 21, 2025Updated 10 months ago
- β73Apr 2, 2024Updated last year
- Collections of RLxLM experiments using minimal codesβ14Feb 17, 2025Updated last year
- Managed Kubernetes at scale on DigitalOcean β’ AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- [ICML2025] Official Repo for Paper "Optimizing Temperature for Language Models with Multi-Sample Inference"β22Feb 16, 2025Updated last year
- Self-Alignment with Principle-Following Reward Modelsβ170Sep 18, 2025Updated 6 months ago
- GenRM-CoT: Data release for verification rationalesβ67Oct 16, 2024Updated last year
- [ICML 2025] M-STAR (Multimodal Self-Evolving TrAining for Reasoning) Project. Diving into Self-Evolving Training for Multimodal Reasoningβ71Jul 13, 2025Updated 8 months ago
- β322Sep 18, 2024Updated last year
- [EMNLP '23] Discriminator-Guided Chain-of-Thought Reasoningβ50Oct 11, 2024Updated last year
- Official github repo for the paper "Compression Represents Intelligence Linearly" [COLM 2024]β147Sep 20, 2024Updated last year
- β342Jun 5, 2025Updated 9 months ago
- ReST-MCTS*: LLM Self-Training via Process Reward Guided Tree Search (NeurIPS 2024)β695Jan 20, 2025Updated last year
- Wordpress hosting with auto-scaling on Cloudways β’ AdFully Managed hosting built for WordPress-powered businesses that need reliable, auto-scalable hosting. Cloudways SafeUpdates now available.
- Search, Verify and Feedback: Towards Next Generation Post-training Paradigm of Foundation Models via Verifier Engineeringβ63Dec 5, 2024Updated last year
- Direct preference optimization with f-divergences.β16Nov 3, 2024Updated last year
- The Lean Theorem Proving Environmentβ15May 7, 2023Updated 2 years ago
- Deita: Data-Efficient Instruction Tuning for Alignment [ICLR2024]β591Dec 9, 2024Updated last year
- The official implementation of Self-Play Fine-Tuning (SPIN)β1,234May 8, 2024Updated last year
- Few-Shot Preference Optimization (FSPO) personalizes LLMs by reframing reward modeling as a meta-learning problem, enabling rapid adaptatβ¦β15Feb 27, 2025Updated last year
- ICCV 2023 - AdaptGuard: Defending Against Universal Attacks for Model Adaptationβ11Dec 23, 2023Updated 2 years ago
- [AAAI 2025 oral] Evaluating Mathematical Reasoning Beyond Accuracyβ78Oct 9, 2025Updated 5 months ago
- GSM-Plus: Data, Code, and Evaluation for Enhancing Robust Mathematical Reasoning in Math Word Problems.β64Jul 8, 2024Updated last year
- 1-Click AI Models by DigitalOcean Gradient β’ AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click and start building anything your business needs.
- PyTorch implementation of StableMask (ICML'24)β15Jun 27, 2024Updated last year
- A Large-Scale, High-Quality Math Dataset for Reinforcement Learning in Language Modelsβ73Feb 25, 2025Updated last year
- A minimal re-implementation of orthogonal fine-tuning (OFT), a diffusion method, for LLMs. Based on nanoGPT and minLoRA.β14Nov 17, 2023Updated 2 years ago
- Code for the arXiv preprint "The Unreasonable Effectiveness of Easy Training Data"β48Jan 17, 2024Updated 2 years ago
- β44Sep 19, 2024Updated last year
- β44Nov 17, 2024Updated last year
- Code for the paper "VinePPO: Unlocking RL Potential For LLM Reasoning Through Refined Credit Assignment"β189May 25, 2025Updated 10 months ago
- β23Dec 18, 2024Updated last year
- MetaMath: Bootstrap Your Own Mathematical Questions for Large Language Modelsβ454Feb 1, 2024Updated 2 years ago
- 1-Click AI Models by DigitalOcean Gradient β’ AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click and start building anything your business needs.
- Neural theorem proving tutorial, version IIβ40Apr 26, 2024Updated last year
- β38May 2, 2024Updated last year
- LLMs + Lean, on your laptop or in the cloudβ204Oct 10, 2025Updated 5 months ago
- β334May 31, 2025Updated 9 months ago
- [NeurIPS 2024] OlympicArena: Benchmarking Multi-discipline Cognitive Reasoning for Superintelligent AIβ107Mar 6, 2025Updated last year
- Interesting ATP Proofsβ13Sep 3, 2021Updated 4 years ago
- Neural theorem proving evaluation via the Lean REPLβ23Jul 12, 2025Updated 8 months ago