qiufengqijun / open-r1-reprodView external linksLinks
这是一个open-r1的复现项目,对0.5B、1.5B、3B、7B的qwen模型进行GRPO训练,观察到一些有趣的现象。
☆56Apr 13, 2025Updated 10 months ago
Alternatives and similar repositories for open-r1-reprod
Users that are interested in open-r1-reprod are comparing it to the libraries listed below
Sorting:
- 集成Qwen与DeepSeek等先进大语言模型,支持纯LLM+分类层模式及LLM+LoRA+分类层模式,使用transformers模块化设计和训练便于根据需要调整或替换组件。☆19Sep 1, 2025Updated 5 months ago
- ☆11Updated this week
- ☆26Nov 26, 2024Updated last year
- 超简单复现Deepseek-R1-Zero和Deepseek-R1,以「24点游戏」为例。通过zero-RL、SFT以及SFT+RL,以激发LLM的自主验证反思能力。 About Clean, minimal, accessible reproduction of Dee…☆33Apr 5, 2025Updated 10 months ago
- Difyで作る生成AIアプリ完全入門☆17May 25, 2025Updated 8 months ago
- A simple WeChat Official Account layout tool based on Dify☆16Jun 27, 2025Updated 7 months ago
- ☆42Mar 6, 2025Updated 11 months ago
- Official completion of “Training on the Benchmark Is Not All You Need”.☆39Dec 31, 2024Updated last year
- [ICLR 2026] Do Not Let Low-Probability Tokens Over-Dominate in RL for LLMs☆41May 20, 2025Updated 8 months ago
- 100 Production-Ready Claude Code Skills - The most comprehensive collection of AI skills for sales, business automation, content creation…☆35Oct 22, 2025Updated 3 months ago
- ☆28Dec 4, 2025Updated 2 months ago
- ☆11Aug 29, 2025Updated 5 months ago
- A full-stack AI-powered business intelligence tool for non-experts, featuring serverless backend processing and a secure Streamlit fronte…☆25Jan 6, 2026Updated last month
- 此仓库用于储存湖南理工学院oj上的题解☆11Oct 7, 2021Updated 4 years ago
- Workflow automation, but you just describe what you want and it happens.☆26Nov 22, 2025Updated 2 months ago
- Write the database metadata into the dify knowledge☆12Dec 30, 2025Updated last month
- ☆11May 16, 2025Updated 9 months ago
- ☆11Jan 31, 2025Updated last year
- Optimizing Anytime Reasoning via Budget Relative Policy Optimization☆51Jul 15, 2025Updated 7 months ago
- 🤖AI Agents for Financial Trading💰: LLM-Driven Stock Prediction & Investment Recommendation System☆13Apr 14, 2025Updated 10 months ago
- 知予人工智能:从学习者到研究者☆13Jan 20, 2025Updated last year
- LangReact 是一个配置化的 Planning Agent 应用开发工具,通过配置、插件,能快速为你的 GPT 应用提供 Planning 功能。☆12Apr 23, 2024Updated last year
- ☆10Apr 30, 2025Updated 9 months ago
- Use the knowledge graph generated by GraphRAG as the external knowledge base for the Dify workflow.☆21Jun 4, 2025Updated 8 months ago
- ☆12Jun 28, 2024Updated last year
- dify 知识库检索工具☆13Apr 3, 2025Updated 10 months ago
- This is a fork from Ryan Carson's AI Dev Tasks repository, with some code cleanup and refactoring to enable support for PostgreSQL databa…☆15Sep 8, 2025Updated 5 months ago
- A distilled DeepSeek-R1 variant built on Qwen2.5-32B, fine-tuned with curated data for enhanced performance and efficiency. <metadata> gp…☆16Mar 11, 2025Updated 11 months ago
- ☆28Jun 27, 2025Updated 7 months ago
- A small framework to benchmark forecasting models via backtesting☆13Nov 25, 2023Updated 2 years ago
- Python Telegraph api.☆15Mar 22, 2025Updated 10 months ago
- Documentation at☆14Mar 27, 2025Updated 10 months ago
- ☆28Feb 10, 2026Updated last week
- A multi-agent framework to help with your homework.☆10Mar 1, 2025Updated 11 months ago
- AlphaGo Zero Reinforcement Learning Sokoban Solver☆11Jun 20, 2018Updated 7 years ago
- ☆10Dec 29, 2023Updated 2 years ago
- MPLS VPNs (VPLS, VPWS, L3VPN) on eNSP using Huawei Routers☆11Feb 11, 2020Updated 6 years ago
- Java implementation for the Agent2Agent Protocol (A2A - https://github.com/google/A2A), enabling interaction between AI agents through a …☆11Apr 21, 2025Updated 9 months ago
- An SSH plugin for Dify☆12Jan 16, 2026Updated last month