XueruiSu / Reproduce-DeepSeek-R1-Survey
This repository collects various works that reproduce DeepSeek R1, as well as works related to DeepSeek R1 and the DeepSeek series.
☆16Updated 3 weeks ago
Alternatives and similar repositories for Reproduce-DeepSeek-R1-Survey
Users that are interested in Reproduce-DeepSeek-R1-Survey are comparing it to the libraries listed below
Sorting:
- MathFusion: Enhancing Mathematic Problem-solving of LLM through Instruction Fusion☆19Updated this week
- Awesome Long-CoT Data☆14Updated last month
- Code for "Towards Revealing the Mystery behind Chain of Thought: a Theoretical Perspective"☆20Updated last year
- ☆13Updated last year
- Principled Data Selection for Alignment: The Hidden Risks of Difficult Examples☆33Updated last month
- ☆31Updated last year
- ☆120Updated 10 months ago
- ☆14Updated last year
- ☆17Updated 11 months ago
- ☆10Updated 5 months ago
- ☆17Updated 11 months ago
- The official implementation of "ICDPO: Effectively Borrowing Alignment Capability of Others via In-context Direct Preference Optimization…☆14Updated last year
- Source code of LatentOps☆78Updated last year
- Reference implementation for Token-level Direct Preference Optimization(TDPO)☆138Updated 3 months ago
- Code accompanying the paper "Noise Contrastive Alignment of Language Models with Explicit Rewards" (NeurIPS 2024)☆51Updated 6 months ago
- ☆66Updated 5 months ago
- Easy-to-Hard Generalization: Scalable Alignment Beyond Human Supervision☆120Updated 8 months ago
- Structured Chemistry Reasoning with Large Language Models☆38Updated last year
- [ICLR 25 Oral] RM-Bench: Benchmarking Reward Models of Language Models with Subtlety and Style☆40Updated last month
- [ACL'24] Beyond One-Preference-Fits-All Alignment: Multi-Objective Direct Preference Optimization☆76Updated 8 months ago
- ☆67Updated last year
- ☆20Updated 5 months ago
- Direct preference optimization with f-divergences.☆13Updated 6 months ago
- Extending context length of visual language models☆11Updated 5 months ago
- Repository for AAAI 2024 paper "From Artificially Real to Real: Leveraging Pseudo Data from Large Language Models for Low-Resource Molecu…☆22Updated last year
- ☆22Updated last year
- This my attempt to create Self-Correcting-LLM based on the paper Training Language Models to Self-Correct via Reinforcement Learning by g…☆35Updated last month
- ☆26Updated 2 years ago
- [ACL 2024] Masked Thought: Simply Masking Partial Reasoning Steps Can Improve Mathematical Reasoning Learning of Language Models☆21Updated 10 months ago
- Text Diffusion Model with Encoder-Decoder Transformers for Sequence-to-Sequence Generation [NAACL 2024]☆94Updated last year