Easy-to-Hard Generalization: Scalable Alignment Beyond Human Supervision
β124Sep 9, 2024Updated last year
Alternatives and similar repositories for easy-to-hard
Users that are interested in easy-to-hard are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Simple and efficient pytorch-native transformer training and inference (batched)β78Apr 2, 2024Updated 2 years ago
- [NeurIPS'24] Official code for *π―DART-Math: Difficulty-Aware Rejection Tuning for Mathematical Problem-Solving*β121Dec 10, 2024Updated last year
- B-STAR: Monitoring and Balancing Exploration and Exploitation in Self-Taught Reasonersβ86May 21, 2025Updated 11 months ago
- β74Apr 2, 2024Updated 2 years ago
- Collections of RLxLM experiments using minimal codesβ14Feb 17, 2025Updated last year
- Managed Kubernetes at scale on DigitalOcean β’ AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- [ICML2025] Official Repo for Paper "Optimizing Temperature for Language Models with Multi-Sample Inference"β22Feb 16, 2025Updated last year
- Self-Alignment with Principle-Following Reward Modelsβ170Sep 18, 2025Updated 7 months ago
- GenRM-CoT: Data release for verification rationalesβ68Oct 16, 2024Updated last year
- [ICML 2025] M-STAR (Multimodal Self-Evolving TrAining for Reasoning) Project. Diving into Self-Evolving Training for Multimodal Reasoningβ74Jul 13, 2025Updated 9 months ago
- β323Sep 18, 2024Updated last year
- [EMNLP '23] Discriminator-Guided Chain-of-Thought Reasoningβ50Oct 11, 2024Updated last year
- β341Jun 5, 2025Updated 11 months ago
- Official github repo for the paper "Compression Represents Intelligence Linearly" [COLM 2024]β147Sep 20, 2024Updated last year
- ReST-MCTS*: LLM Self-Training via Process Reward Guided Tree Search (NeurIPS 2024)β702Jan 20, 2025Updated last year
- Wordpress hosting with auto-scaling - Free Trial Offer β’ AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- Search, Verify and Feedback: Towards Next Generation Post-training Paradigm of Foundation Models via Verifier Engineeringβ63Dec 5, 2024Updated last year
- Direct preference optimization with f-divergences.β16Nov 3, 2024Updated last year
- The Lean Theorem Proving Environmentβ15May 7, 2023Updated 3 years ago
- Deita: Data-Efficient Instruction Tuning for Alignment [ICLR2024]β593Dec 9, 2024Updated last year
- The official implementation of Self-Play Fine-Tuning (SPIN)β1,239May 8, 2024Updated 2 years ago
- Few-Shot Preference Optimization (FSPO) personalizes LLMs by reframing reward modeling as a meta-learning problem, enabling rapid adaptatβ¦β15Feb 27, 2025Updated last year
- [AAAI 2025 oral] Evaluating Mathematical Reasoning Beyond Accuracyβ80Oct 9, 2025Updated 7 months ago
- GSM-Plus: Data, Code, and Evaluation for Enhancing Robust Mathematical Reasoning in Math Word Problems.β65Jul 8, 2024Updated last year
- PyTorch implementation of StableMask (ICML'24)β15Jun 27, 2024Updated last year
- Deploy on Railway without the complexity - Free Credits Offer β’ AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- A minimal re-implementation of orthogonal fine-tuning (OFT), a diffusion method, for LLMs. Based on nanoGPT and minLoRA.β14Nov 17, 2023Updated 2 years ago
- Code for the arXiv preprint "The Unreasonable Effectiveness of Easy Training Data"β48Jan 17, 2024Updated 2 years ago
- β44Sep 19, 2024Updated last year
- A Large-Scale, High-Quality Math Dataset for Reinforcement Learning in Language Modelsβ72Feb 25, 2025Updated last year
- β44Nov 17, 2024Updated last year
- Code for the paper "VinePPO: Unlocking RL Potential For LLM Reasoning Through Refined Credit Assignment"β190May 25, 2025Updated 11 months ago
- β22Dec 18, 2024Updated last year
- MetaMath: Bootstrap Your Own Mathematical Questions for Large Language Modelsβ455Feb 1, 2024Updated 2 years ago
- Neural theorem proving tutorial, version IIβ40Apr 26, 2024Updated 2 years ago
- 1-Click AI Models by DigitalOcean Gradient β’ AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- β39May 2, 2024Updated 2 years ago
- LLMs + Lean, on your laptop or in the cloudβ208Oct 10, 2025Updated 6 months ago
- β337May 31, 2025Updated 11 months ago
- Interesting ATP Proofsβ13Sep 3, 2021Updated 4 years ago
- Neural theorem proving evaluation via the Lean REPLβ23Jul 12, 2025Updated 9 months ago
- [NeurIPS 2024] OlympicArena: Benchmarking Multi-discipline Cognitive Reasoning for Superintelligent AIβ106Mar 6, 2025Updated last year
- Rethinking RL Scaling for Vision Language Models: A Transparent, From-Scratch Framework and Comprehensive Evaluation Schemeβ148Apr 9, 2025Updated last year