kyegomez / OpenR1Links
An open source implementation of R1
☆28Updated 2 weeks ago
Alternatives and similar repositories for OpenR1
Users that are interested in OpenR1 are comparing it to the libraries listed below
Sorting:
- ☆94Updated 8 months ago
- ☆46Updated 2 months ago
- Resources for our paper: "EvoAgent: Towards Automatic Multi-Agent Generation via Evolutionary Algorithms"☆120Updated 9 months ago
- 最简易的R1结果在小模型上的复现,阐述类O1与DeepSeek R1最重要的本质。Think is all your need。利用实验佐证,对于强推理能力,think思考过程性内容是AGI/ASI的核心。☆44Updated 6 months ago
- Efficient Agent Training for Computer Use☆122Updated 2 months ago
- ☆91Updated this week
- This is a repo for showcasing using MCTS with LLMs to solve gsm8k problems☆86Updated 4 months ago
- [ICML 2025] |TokenSwift: Lossless Acceleration of Ultra Long Sequence Generation☆113Updated 2 months ago
- Official implementation for "ScoreFlow: Mastering LLM Agent Workflows via Score-based Preference Optimization"☆80Updated 2 months ago
- ☆65Updated last year
- ☆77Updated 11 months ago
- ☆103Updated 8 months ago
- [ACL 2025] Agentic Knowledgeable Self-awareness☆80Updated last month
- SELF-GUIDE: Better Task-Specific Instruction Following via Self-Synthetic Finetuning. COLM 2024 Accepted Paper☆33Updated last year
- ☆96Updated this week
- rStar-Math: Small LLMs Can Master Math Reasoning with Self-Evolved Deep Thinking☆38Updated 6 months ago
- ☆83Updated last year
- ☆90Updated 2 months ago
- ☆66Updated 4 months ago
- [ACL 2025] Agentic Reward Modeling: Integrating Human Preferences with Verifiable Correctness Signals for Reliable Reward Systems☆99Updated 2 months ago
- Official code repository for Sketch-of-Thought (SoT)☆125Updated 3 months ago
- 超简单复现Deepseek-R1-Zero和Deepseek-R1,以「24点游戏」为例。通过zero-RL、SFT以及SFT+RL,以激发LLM的自主验证反思能力。 About Clean, minimal, accessible reproduction of Dee…☆24Updated 4 months ago
- [ICLR 2025]ChemAgent: Self-updating Library in Large Language Models Improves Chemical Reasoning https://arxiv.org/abs/2501.06590☆64Updated last week
- The code for paper: Decoupled Planning and Execution: A Hierarchical Reasoning Framework for Deep Search☆51Updated last month
- ☆59Updated 8 months ago
- MARFT stands for Multi-Agent Reinforcement Fine-Tuning. This repository implements an LLM-based multi-agent reinforcement fine-tuning fra…☆54Updated 3 weeks ago
- Code and Data for the paper "Evaluating Character Understanding of Large Language Models via Character Profiling from Fictional Works".☆19Updated last year
- Copy the MLP of llama3 8 times as 8 experts , created a router with random initialization,add load balancing loss to construct an 8x8b Mo…☆27Updated last year
- Data preparation code for CrystalCoder 7B LLM☆45Updated last year
- Minimal implementation of the Self-Play Fine-Tuning Converts Weak Language Models to Strong Language Models paper (ArXiv 20232401.01335)☆29Updated last year