XueruiSu / Reproduce-DeepSeek-R1-SurveyLinks
This repository collects various works that reproduce DeepSeek R1, as well as works related to DeepSeek R1 and the DeepSeek series.
☆16Updated last month
Alternatives and similar repositories for Reproduce-DeepSeek-R1-Survey
Users that are interested in Reproduce-DeepSeek-R1-Survey are comparing it to the libraries listed below
Sorting:
- A Framework for LLM-based Multi-Agent Reinforced Training and Inference☆89Updated last week
- MathFusion: Enhancing Mathematic Problem-solving of LLM through Instruction Fusion (ACL 2025)☆24Updated last week
- Awesome Long-CoT Data☆15Updated 2 months ago
- ☆13Updated last year
- [ACL'24] Beyond One-Preference-Fits-All Alignment: Multi-Objective Direct Preference Optimization☆79Updated 9 months ago
- ☆69Updated 6 months ago
- The official implementation of "ICDPO: Effectively Borrowing Alignment Capability of Others via In-context Direct Preference Optimization…☆16Updated last year
- Principled Data Selection for Alignment: The Hidden Risks of Difficult Examples☆36Updated last month
- Reference implementation for Token-level Direct Preference Optimization(TDPO)☆140Updated 3 months ago
- Watch Every Step! LLM Agent Learning via Iterative Step-level Process Refinement (EMNLP 2024 Main Conference)☆57Updated 7 months ago
- This is the repository that contains the source code for the Self-Evaluation Guided MCTS for online DPO.☆314Updated 10 months ago
- PRESTO: Progressive Pretraining Enhances Synthetic Chemistry Outcomes [EMNLP 2024]☆25Updated 6 months ago
- ☆9Updated 6 months ago
- Code accompanying the paper "Noise Contrastive Alignment of Language Models with Explicit Rewards" (NeurIPS 2024)☆53Updated 6 months ago
- ☆7Updated 3 months ago
- The official code repository for PRMBench.☆73Updated 3 months ago
- Model merging is a highly efficient approach for long-to-short reasoning.☆56Updated this week
- ☆31Updated last year
- ☆17Updated last year
- Code for "Towards Revealing the Mystery behind Chain of Thought: a Theoretical Perspective"☆20Updated last year
- ☆20Updated 5 months ago
- Structured Chemistry Reasoning with Large Language Models☆38Updated last year
- Implementation for the research paper "Enhancing LLM Reasoning via Critique Models with Test-Time and Training-Time Supervision".☆54Updated 6 months ago
- ☆31Updated last year
- Research Code for preprint "Optimizing Test-Time Compute via Meta Reinforcement Finetuning".☆95Updated 2 months ago
- Official repository for ACL 2025 paper "ProcessBench: Identifying Process Errors in Mathematical Reasoning"☆155Updated 2 weeks ago
- ☆121Updated 10 months ago
- Lightweight Adapting for Black-Box Large Language Models☆22Updated last year
- [ICLR 2025 Workshop] "Landscape of Thoughts: Visualizing the Reasoning Process of Large Language Models"☆19Updated last week
- A Sober Look at Language Model Reasoning☆63Updated last week