PKU-Alignment / eval-anything
☆10Updated 2 weeks ago
Alternatives and similar repositories for eval-anything:
Users that are interested in eval-anything are comparing it to the libraries listed below
- SOTA RL fine-tuning solution for advanced math reasoning of LLM☆103Updated last week
- An index of algorithms for reinforcement learning from human feedback (rlhf))☆93Updated last year
- [NeurIPS 2024 Oral] Aligner: Efficient Alignment by Learning to Correct☆169Updated 3 months ago
- Implementation for the research paper "Enhancing LLM Reasoning via Critique Models with Test-Time and Training-Time Supervision".☆52Updated 4 months ago
- ICLR 2025 Agent-Related Papers☆62Updated 5 months ago
- 😎 A Survey of Efficient Reasoning for Large Reasoning Models: Language, Multimodality, and Beyond☆160Updated this week
- [NeurIPS 2023] Large Language Models Are Semi-Parametric Reinforcement Learning Agents☆35Updated 11 months ago
- Curation of resources for LLM research, screened by @tongyx361 to ensure high quality and accompanied with elaborately-written concise de…☆51Updated 9 months ago
- Research Code for "ArCHer: Training Language Model Agents via Hierarchical Multi-Turn RL"☆161Updated last year
- The official code repository for PRMBench.☆72Updated 2 months ago
- [ACL'24] Beyond One-Preference-Fits-All Alignment: Multi-Objective Direct Preference Optimization☆74Updated 7 months ago
- A comprehensive collection of process reward models.☆53Updated last week
- This my attempt to create Self-Correcting-LLM based on the paper Training Language Models to Self-Correct via Reinforcement Learning by g…☆33Updated 2 weeks ago
- Reference implementation for Token-level Direct Preference Optimization(TDPO)☆136Updated 2 months ago
- Code for the ICML 2024 paper "Rewards-in-Context: Multi-objective Alignment of Foundation Models with Dynamic Preference Adjustment"☆62Updated 3 months ago
- Paper collections of the continuous effort start from World Models.☆170Updated 9 months ago
- ☆131Updated 3 months ago
- ☆54Updated 6 months ago
- SafeSora is a human preference dataset designed to support safety alignment research in the text-to-video generation field, aiming to enh…☆30Updated 7 months ago
- Curation of resources for LLM mathematical reasoning, most of which are screened by @tongyx361 to ensure high quality and accompanied wit…☆122Updated 9 months ago
- Official Repo for Fine-Tuning Large Vision-Language Models as Decision-Making Agents via Reinforcement Learning☆336Updated 4 months ago
- ☆85Updated this week
- ☆126Updated 9 months ago
- xVerify: Efficient Answer Verifier for Reasoning Model Evaluations☆60Updated this week
- [NeurIPS 2024] Code for the paper "Diffusion of Thoughts: Chain-of-Thought Reasoning in Diffusion Language Models"☆144Updated last month
- [ICLR 2024 Spotlight] Code for the paper "Text2Reward: Reward Shaping with Language Models for Reinforcement Learning"☆154Updated 4 months ago
- Official codebase for "GenPRM: Scaling Test-Time Compute of Process Reward Models via Generative Reasoning".☆60Updated this week
- ☆184Updated last month
- FeatureAlignment = Alignment + Mechanistic Interpretability☆28Updated last month
- Paper collections of methods that using language to interact with environment, including interact with real world, simulated world or WWW…☆127Updated last year