Easy-to-Hard Generalization: Scalable Alignment Beyond Human Supervision
β124Sep 9, 2024Updated last year
Alternatives and similar repositories for easy-to-hard
Users that are interested in easy-to-hard are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Simple and efficient pytorch-native transformer training and inference (batched)β78Apr 2, 2024Updated 2 years ago
- [NeurIPS'24] Official code for *π―DART-Math: Difficulty-Aware Rejection Tuning for Mathematical Problem-Solving*β122Dec 10, 2024Updated last year
- B-STAR: Monitoring and Balancing Exploration and Exploitation in Self-Taught Reasonersβ86May 21, 2025Updated last year
- β74Apr 2, 2024Updated 2 years ago
- Collections of RLxLM experiments using minimal codesβ14Feb 17, 2025Updated last year
- Deploy to Railway using AI coding agents - Free Credits Offer β’ AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- [ICML2025] Official Repo for Paper "Optimizing Temperature for Language Models with Multi-Sample Inference"β23Feb 16, 2025Updated last year
- Self-Alignment with Principle-Following Reward Modelsβ170Sep 18, 2025Updated 8 months ago
- GenRM-CoT: Data release for verification rationalesβ68Oct 16, 2024Updated last year
- [ICML 2025] M-STAR (Multimodal Self-Evolving TrAining for Reasoning) Project. Diving into Self-Evolving Training for Multimodal Reasoningβ74Jul 13, 2025Updated 10 months ago
- β324Sep 18, 2024Updated last year
- [EMNLP '23] Discriminator-Guided Chain-of-Thought Reasoningβ50Oct 11, 2024Updated last year
- β340Jun 5, 2025Updated 11 months ago
- Official github repo for the paper "Compression Represents Intelligence Linearly" [COLM 2024]β147Sep 20, 2024Updated last year
- ReST-MCTS*: LLM Self-Training via Process Reward Guided Tree Search (NeurIPS 2024)β705Jan 20, 2025Updated last year
- Deploy to Railway using AI coding agents - Free Credits Offer β’ AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- Search, Verify and Feedback: Towards Next Generation Post-training Paradigm of Foundation Models via Verifier Engineeringβ63Dec 5, 2024Updated last year
- Direct preference optimization with f-divergences.β17Nov 3, 2024Updated last year
- The Lean Theorem Proving Environmentβ15May 7, 2023Updated 3 years ago
- Deita: Data-Efficient Instruction Tuning for Alignment [ICLR2024]β596Dec 9, 2024Updated last year
- The official implementation of Self-Play Fine-Tuning (SPIN)β1,240May 8, 2024Updated 2 years ago
- Few-Shot Preference Optimization (FSPO) personalizes LLMs by reframing reward modeling as a meta-learning problem, enabling rapid adaptatβ¦β16Feb 27, 2025Updated last year
- ICCV 2023 - AdaptGuard: Defending Against Universal Attacks for Model Adaptationβ11Dec 23, 2023Updated 2 years ago
- [AAAI 2025 oral] Evaluating Mathematical Reasoning Beyond Accuracyβ80Oct 9, 2025Updated 7 months ago
- GSM-Plus: Data, Code, and Evaluation for Enhancing Robust Mathematical Reasoning in Math Word Problems.β66Jul 8, 2024Updated last year
- Open source password manager - Proton Pass β’ AdSecurely store, share, and autofill your credentials with Proton Pass, the end-to-end encrypted password manager trusted by millions.
- PyTorch implementation of StableMask (ICML'24)β15Jun 27, 2024Updated last year
- A minimal re-implementation of orthogonal fine-tuning (OFT), a diffusion method, for LLMs. Based on nanoGPT and minLoRA.β14Nov 17, 2023Updated 2 years ago
- Code for the arXiv preprint "The Unreasonable Effectiveness of Easy Training Data"β48Jan 17, 2024Updated 2 years ago
- β44Sep 19, 2024Updated last year
- A Large-Scale, High-Quality Math Dataset for Reinforcement Learning in Language Modelsβ72Feb 25, 2025Updated last year
- β44Nov 17, 2024Updated last year
- Code for the paper "VinePPO: Unlocking RL Potential For LLM Reasoning Through Refined Credit Assignment"β190May 25, 2025Updated last year
- β22Dec 18, 2024Updated last year
- MetaMath: Bootstrap Your Own Mathematical Questions for Large Language Modelsβ456Feb 1, 2024Updated 2 years ago
- Deploy on Railway without the complexity - Free Credits Offer β’ AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- Neural theorem proving tutorial, version IIβ40Apr 26, 2024Updated 2 years ago
- β39May 2, 2024Updated 2 years ago
- LLMs + Lean, on your laptop or in the cloudβ210Oct 10, 2025Updated 7 months ago
- β337May 31, 2025Updated 11 months ago
- Neural theorem proving evaluation via the Lean REPLβ23Jul 12, 2025Updated 10 months ago
- Interesting ATP Proofsβ13Sep 3, 2021Updated 4 years ago
- [NeurIPS 2024] OlympicArena: Benchmarking Multi-discipline Cognitive Reasoning for Superintelligent AIβ106Mar 6, 2025Updated last year