XueruiSu / Reproduce-DeepSeek-R1-SurveyLinks
This repository collects various works that reproduce DeepSeek R1, as well as works related to DeepSeek R1 and the DeepSeek series.
☆18Updated 3 months ago
Alternatives and similar repositories for Reproduce-DeepSeek-R1-Survey
Users that are interested in Reproduce-DeepSeek-R1-Survey are comparing it to the libraries listed below
Sorting:
- Awesome Long-CoT Data☆16Updated 4 months ago
- ☆14Updated last year
- A Framework for LLM-based Multi-Agent Reinforced Training and Inference☆185Updated this week
- MathFusion: Enhancing Mathematical Problem-solving of LLM through Instruction Fusion (ACL 2025)☆27Updated 3 weeks ago
- ☆13Updated 8 months ago
- This is the repository that contains the source code for the Self-Evaluation Guided MCTS for online DPO.☆319Updated last year
- ☆33Updated last year
- Reference implementation for Token-level Direct Preference Optimization(TDPO)☆146Updated 5 months ago
- ☆19Updated 8 months ago
- ☆125Updated last year
- Official repository for ACL 2025 paper "ProcessBench: Identifying Process Errors in Mathematical Reasoning"☆167Updated 2 months ago
- Code, benchmark and environment for "ScienceBoard: Evaluating Multimodal Autonomous Agents in Realistic Scientific Workflows"☆100Updated last month
- Principled Data Selection for Alignment: The Hidden Risks of Difficult Examples☆39Updated 3 weeks ago
- ☆263Updated 2 months ago
- This my attempt to create Self-Correcting-LLM based on the paper Training Language Models to Self-Correct via Reinforcement Learning by g…☆35Updated last month
- Repository for AAAI 2024 paper "From Artificially Real to Real: Leveraging Pseudo Data from Large Language Models for Low-Resource Molecu…☆22Updated last year
- [ICLR 2025] <MOOSE-Chem: Large Language Models for Rediscovering Unseen Chemistry Scientific Hypotheses>☆46Updated last month
- Official code for the paper, "Stop Summation: Min-Form Credit Assignment Is All Process Reward Model Needs for Reasoning"☆133Updated 3 weeks ago
- ☆255Updated last month
- Repo of paper "Free Process Rewards without Process Labels"☆161Updated 4 months ago
- Implementation for the paper "The Surprising Effectiveness of Negative Reinforcement in LLM Reasoning"☆85Updated last month
- ☆15Updated 2 weeks ago
- The Entropy Mechanism of Reinforcement Learning for Large Language Model Reasoning.☆282Updated last month
- [ACL'24] Beyond One-Preference-Fits-All Alignment: Multi-Objective Direct Preference Optimization☆85Updated 11 months ago
- [EMNLP 2024] The official GitHub repo for the survey paper "Knowledge Conflicts for LLMs: A Survey"☆130Updated 10 months ago
- A comprehensive collection of process reward models.☆99Updated 2 weeks ago
- ☆47Updated 9 months ago
- Deepseek R1 zero tiny version own reproduce on two A100s.☆70Updated 6 months ago
- The official implementation of "ICDPO: Effectively Borrowing Alignment Capability of Others via In-context Direct Preference Optimization…☆16Updated last year
- ☆10Updated 5 months ago