Edward-Sun / easy-to-hard
Easy-to-Hard Generalization: Scalable Alignment Beyond Human Supervision
β112Updated 4 months ago
Alternatives and similar repositories for easy-to-hard:
Users that are interested in easy-to-hard are comparing it to the libraries listed below
- Code for the paper "VinePPO: Unlocking RL Potential For LLM Reasoning Through Refined Credit Assignment"β114Updated 2 months ago
- [NeurIPS'24] Official code for *π―DART-Math: Difficulty-Aware Rejection Tuning for Mathematical Problem-Solving*β90Updated last month
- Repo of paper "Free Process Rewards without Process Labels"β110Updated last week
- Interpretable Contrastive Monte Carlo Tree Search Reasoningβ40Updated 2 months ago
- β47Updated 2 months ago
- GenRM-CoT: Data release for verification rationalesβ45Updated 3 months ago
- Curation of resources for LLM mathematical reasoning, most of which are screened by @tongyx361 to ensure high quality and accompanied witβ¦β105Updated 6 months ago
- Official github repo for the paper "Compression Represents Intelligence Linearly" [COLM 2024]β130Updated 4 months ago
- β61Updated 9 months ago
- β129Updated last month
- Reference implementation for Token-level Direct Preference Optimization(TDPO)β126Updated 6 months ago
- [EMNLP Findings 2024 & ACL 2024 NLRSE Oral] Enhancing Mathematical Reasoning in Language Models with Fine-grained Rewardsβ49Updated 8 months ago
- [AAAI 2025 oral] Evaluating Mathematical Reasoning Beyond Accuracyβ44Updated last month
- Personalized Soups: Personalized Large Language Model Alignment via Post-hoc Parameter Mergingβ98Updated last year
- Implementation of the ICML 2024 paper "Training Large Language Models for Reasoning through Reverse Curriculum Reinforcement Learning" prβ¦β87Updated 11 months ago
- β34Updated 11 months ago
- [NeurIPS 2024] The official implementation of paper: Chain of Preference Optimization: Improving Chain-of-Thought Reasoning in LLMs.β90Updated 3 months ago
- β86Updated last year
- Function Vectors in Large Language Models (ICLR 2024)β135Updated 3 months ago
- [ACL'24] Beyond One-Preference-Fits-All Alignment: Multi-Objective Direct Preference Optimizationβ64Updated 5 months ago
- [NeurIPS 2024] Code for the paper "Diffusion of Thoughts: Chain-of-Thought Reasoning in Diffusion Language Models"β96Updated 11 months ago
- ICML 2024 - Official Repository for EXO: Towards Efficient Exact Optimization of Language Model Alignmentβ49Updated 7 months ago
- Directional Preference Alignmentβ54Updated 4 months ago
- Watch Every Step! LLM Agent Learning via Iterative Step-level Process Refinement (EMNLP 2024 Main Conference)β50Updated 3 months ago
- [EMNLP 2024] Source code for the paper "Learning Planning-based Reasoning with Trajectory Collection and Process Rewards Synthesizing".β66Updated 2 weeks ago
- Trial and Error: Exploration-Based Trajectory Optimization of LLM Agents (ACL 2024 Main Conference)β114Updated 2 months ago
- [NeurIPS'24 Spotlight] Observational Scaling Lawsβ49Updated 3 months ago
- Self-Alignment with Principle-Following Reward Modelsβ152Updated 11 months ago
- This my attempt to create Self-Correcting-LLM based on the paper Training Language Models to Self-Correct via Reinforcement Learning by gβ¦β29Updated last month