Edward-Sun / easy-to-hardView external linksLinks
Easy-to-Hard Generalization: Scalable Alignment Beyond Human Supervision
β124Sep 9, 2024Updated last year
Alternatives and similar repositories for easy-to-hard
Users that are interested in easy-to-hard are comparing it to the libraries listed below
Sorting:
- Simple and efficient pytorch-native transformer training and inference (batched)β79Apr 2, 2024Updated last year
- [NeurIPS'24] Official code for *π―DART-Math: Difficulty-Aware Rejection Tuning for Mathematical Problem-Solving*β120Dec 10, 2024Updated last year
- B-STAR: Monitoring and Balancing Exploration and Exploitation in Self-Taught Reasonersβ86May 21, 2025Updated 8 months ago
- Self-Alignment with Principle-Following Reward Modelsβ169Sep 18, 2025Updated 4 months ago
- Collections of RLxLM experiments using minimal codesβ14Feb 17, 2025Updated 11 months ago
- β72Apr 2, 2024Updated last year
- GenRM-CoT: Data release for verification rationalesβ68Oct 16, 2024Updated last year
- Official github repo for the paper "Compression Represents Intelligence Linearly" [COLM 2024]β147Sep 20, 2024Updated last year
- [ICML 2025] M-STAR (Multimodal Self-Evolving TrAining for Reasoning) Project. Diving into Self-Evolving Training for Multimodal Reasoningβ70Jul 13, 2025Updated 7 months ago
- β321Sep 18, 2024Updated last year
- [ICML2025] Official Repo for Paper "Optimizing Temperature for Language Models with Multi-Sample Inference"β21Feb 16, 2025Updated last year
- β342Jun 5, 2025Updated 8 months ago
- [EMNLP '23] Discriminator-Guided Chain-of-Thought Reasoningβ50Oct 11, 2024Updated last year
- DatasetResearch: Benchmarking Agent Systems for Demand-Driven Dataset Discoveryβ20Sep 24, 2025Updated 4 months ago
- Implementation for the paper "Fictitious Synthetic Data Can Improve LLM Factuality via Prerequisite Learning"β11Jan 10, 2025Updated last year
- Deita: Data-Efficient Instruction Tuning for Alignment [ICLR2024]β588Dec 9, 2024Updated last year
- β39May 2, 2024Updated last year
- [AAAI 2025 oral] Evaluating Mathematical Reasoning Beyond Accuracyβ77Oct 9, 2025Updated 4 months ago
- ReST-MCTS*: LLM Self-Training via Process Reward Guided Tree Search (NeurIPS 2024)β690Jan 20, 2025Updated last year
- Search, Verify and Feedback: Towards Next Generation Post-training Paradigm of Foundation Models via Verifier Engineeringβ63Dec 5, 2024Updated last year
- [EMNLP 2024 Main] Official implementation of the paper "The Accuracy Paradox in RLHF: When Better Reward Models Don't Yield Better Languaβ¦β13Nov 11, 2024Updated last year
- The Lean Theorem Proving Environmentβ14May 7, 2023Updated 2 years ago
- GSM-Plus: Data, Code, and Evaluation for Enhancing Robust Mathematical Reasoning in Math Word Problems.β64Jul 8, 2024Updated last year
- β42Sep 19, 2024Updated last year
- Rethinking RL Scaling for Vision Language Models: A Transparent, From-Scratch Framework and Comprehensive Evaluation Schemeβ147Apr 9, 2025Updated 10 months ago
- Code release for "Debating with More Persuasive LLMs Leads to More Truthful Answers"β124Mar 22, 2024Updated last year
- β44Nov 17, 2024Updated last year
- β23Dec 18, 2024Updated last year
- PyTorch implementation of StableMask (ICML'24)β15Jun 27, 2024Updated last year
- Implementation for "Step-DPO: Step-wise Preference Optimization for Long-chain Reasoning of LLMs"β391Jan 19, 2025Updated last year
- MetaMath: Bootstrap Your Own Mathematical Questions for Large Language Modelsβ454Feb 1, 2024Updated 2 years ago
- The official implementation of Self-Play Fine-Tuning (SPIN)β1,234May 8, 2024Updated last year
- β331May 31, 2025Updated 8 months ago
- A Large-Scale, High-Quality Math Dataset for Reinforcement Learning in Language Modelsβ72Feb 25, 2025Updated 11 months ago
- Code for the arXiv preprint "The Unreasonable Effectiveness of Easy Training Data"β48Jan 17, 2024Updated 2 years ago
- β123Feb 21, 2025Updated 11 months ago
- Official implementation of Bootstrapping Language Models via DPO Implicit Rewardsβ47Apr 15, 2025Updated 10 months ago
- [NeurIPS 2024] OlympicArena: Benchmarking Multi-discipline Cognitive Reasoning for Superintelligent AIβ107Mar 6, 2025Updated 11 months ago
- β17Dec 21, 2023Updated 2 years ago