haoyangliu123 / awesome-deepseek-r1
A collection on the recent reproduction papers and projects on DeepSeek-R1
☆27Updated last month
Alternatives and similar repositories for awesome-deepseek-r1:
Users that are interested in awesome-deepseek-r1 are comparing it to the libraries listed below
- SOTA RL fine-tuning solution for advanced math reasoning of LLM☆91Updated this week
- [ACL'24] Beyond One-Preference-Fits-All Alignment: Multi-Objective Direct Preference Optimization☆72Updated 7 months ago
- Curation of resources for LLM mathematical reasoning, most of which are screened by @tongyx361 to ensure high quality and accompanied wit…☆119Updated 8 months ago
- Awesome-Long2short-on-LRMs is a collection of state-of-the-art, novel, exciting long2short methods on large reasoning models. It contains…☆153Updated this week
- [NeurIPS 2024 Oral] Aligner: Efficient Alignment by Learning to Correct☆165Updated 2 months ago
- ☆186Updated this week
- An index of algorithms for reinforcement learning from human feedback (rlhf))☆93Updated 11 months ago
- Implementation for the research paper "Enhancing LLM Reasoning via Critique Models with Test-Time and Training-Time Supervision".☆52Updated 4 months ago
- Awesome RL-based LLM Reasoning☆341Updated last week
- Code for Paper (ReMax: A Simple, Efficient and Effective Reinforcement Learning Method for Aligning Large Language Models)☆179Updated last year
- [NeurIPS 2024] The official implementation of paper: Chain of Preference Optimization: Improving Chain-of-Thought Reasoning in LLMs.☆104Updated last week
- ☆105Updated 6 months ago
- [NeurIPS 2024] Code for the paper "Diffusion of Thoughts: Chain-of-Thought Reasoning in Diffusion Language Models"☆138Updated 3 weeks ago
- This my attempt to create Self-Correcting-LLM based on the paper Training Language Models to Self-Correct via Reinforcement Learning by g…☆31Updated 3 months ago
- Direct preference optimization with f-divergences.☆13Updated 4 months ago
- ☆54Updated 5 months ago
- ☆117Updated 2 weeks ago
- Code for the ICML 2024 paper "Rewards-in-Context: Multi-objective Alignment of Foundation Models with Dynamic Preference Adjustment"☆61Updated 3 months ago
- OpenRFT: Adapting Reasoning Foundation Model for Domain-specific Tasks with Reinforcement Fine-Tuning☆124Updated 3 months ago
- ☆28Updated last month
- A research repo for experiments about Reinforcement Finetuning☆36Updated last week
- ☆36Updated this week
- This is the repository that contains the source code for the Self-Evaluation Guided MCTS for online DPO.☆299Updated 7 months ago
- Paper list for Efficient Reasoning.☆311Updated this week
- ☆61Updated 4 months ago
- ☆22Updated last week
- Paper List of Inference/Test Time Scaling/Computing☆127Updated last week
- Repo of paper "Free Process Rewards without Process Labels"☆138Updated 2 weeks ago
- Accepted LLM Papers in NeurIPS 2024☆34Updated 5 months ago
- this is an implementation for the paper Improve Mathematical Reasoning in Language Models by Automated Process Supervision from google de…☆25Updated 3 months ago