XueruiSu / Reproduce-DeepSeek-R1-SurveyLinks
This repository collects various works that reproduce DeepSeek R1, as well as works related to DeepSeek R1 and the DeepSeek series.
☆18Updated 2 months ago
Alternatives and similar repositories for Reproduce-DeepSeek-R1-Survey
Users that are interested in Reproduce-DeepSeek-R1-Survey are comparing it to the libraries listed below
Sorting:
- Awesome Long-CoT Data☆15Updated 3 months ago
- ☆13Updated last year
- A Framework for LLM-based Multi-Agent Reinforced Training and Inference☆157Updated last month
- Principled Data Selection for Alignment: The Hidden Risks of Difficult Examples☆38Updated 3 months ago
- This is the repository that contains the source code for the Self-Evaluation Guided MCTS for online DPO.☆318Updated 11 months ago
- The official implementation of "ICDPO: Effectively Borrowing Alignment Capability of Others via In-context Direct Preference Optimization…☆16Updated last year
- Official code for the paper, "Stop Summation: Min-Form Credit Assignment Is All Process Reward Model Needs for Reasoning"☆129Updated this week
- ☆19Updated 7 months ago
- Repository for AAAI 2024 paper "From Artificially Real to Real: Leveraging Pseudo Data from Large Language Models for Low-Resource Molecu…☆22Updated last year
- Official repository for ACL 2025 paper "ProcessBench: Identifying Process Errors in Mathematical Reasoning"☆165Updated 2 months ago
- Code, benchmark and environment for "ScienceBoard: Evaluating Multimodal Autonomous Agents in Realistic Scientific Workflows"☆96Updated 2 weeks ago
- Reference implementation for Token-level Direct Preference Optimization(TDPO)☆141Updated 5 months ago
- ☆33Updated last year
- [ACL' 25] The official code repository for PRMBench: A Fine-grained and Challenging Benchmark for Process-Level Reward Models.☆74Updated 5 months ago
- ☆20Updated 6 months ago
- ☆123Updated last year
- ☆242Updated 2 weeks ago
- This my attempt to create Self-Correcting-LLM based on the paper Training Language Models to Self-Correct via Reinforcement Learning by g…☆35Updated last week
- Repo of paper "Free Process Rewards without Process Labels"☆154Updated 4 months ago
- ☆241Updated last month
- GenRM-CoT: Data release for verification rationales☆63Updated 9 months ago
- A simple toolkit for benchmarking LLMs on mathematical reasoning tasks. 🧮✨☆234Updated last year
- [ACL 2024] Unveiling Linguistic Regions in Large Language Models☆31Updated last year
- [ACL'24] Beyond One-Preference-Fits-All Alignment: Multi-Objective Direct Preference Optimization☆83Updated 10 months ago
- Missing Premise exacerbates Overthinking: Are Reasoning Models losing Critical Thinking Skill?☆30Updated last month
- Awesome-Long2short-on-LRMs is a collection of state-of-the-art, novel, exciting long2short methods on large reasoning models. It contains…☆235Updated last month
- ☆31Updated last year
- Collection of latest papers and materials in the area of RLVR!☆16Updated last month
- ☆202Updated 3 months ago
- ☆8Updated 4 months ago