这是一个open-r1的复现项目,对0.5B、1.5B、3B、7B的qwen模型进行GRPO训练,观察到一些有趣的现象。
☆56Apr 13, 2025Updated 10 months ago
Alternatives and similar repositories for open-r1-reprod
Users that are interested in open-r1-reprod are comparing it to the libraries listed below
Sorting:
- 集成Qwen与DeepSeek等先进大语言模型,支持纯LLM+分类层模式及LLM+LoRA+分类层模式,使用transformers模块化设计和训练便于根据需要调整或替换组件。☆19Sep 1, 2025Updated 6 months ago
- ☆11Updated this week
- ☆26Nov 26, 2024Updated last year
- ☆26Feb 28, 2026Updated last week
- A simple WeChat Official Account layout tool based on Dify☆17Jun 27, 2025Updated 8 months ago
- Difyで作る生成AIアプリ完全入門☆17May 25, 2025Updated 9 months ago
- ☆42Mar 6, 2025Updated last year
- Official completion of “Training on the Benchmark Is Not All You Need”.☆39Dec 31, 2024Updated last year
- [ICLR 2026] Do Not Let Low-Probability Tokens Over-Dominate in RL for LLMs☆41May 20, 2025Updated 9 months ago
- Write the database metadata into the dify knowledge☆12Dec 30, 2025Updated 2 months ago
- ☆11May 16, 2025Updated 9 months ago
- ☆12Jan 31, 2025Updated last year
- A full-stack AI-powered business intelligence tool for non-experts, featuring serverless backend processing and a secure Streamlit fronte…☆28Feb 13, 2026Updated 3 weeks ago
- Workflow automation, but you just describe what you want and it happens.☆27Nov 22, 2025Updated 3 months ago
- ☆28Dec 4, 2025Updated 3 months ago
- ☆11Aug 29, 2025Updated 6 months ago
- 此仓库用于储存湖南理工学院oj上的题解☆11Oct 7, 2021Updated 4 years ago
- dify 知识库检索工具☆13Apr 3, 2025Updated 11 months ago
- 知予人工智能:从学习者到研究者☆13Jan 20, 2025Updated last year
- MPLS VPNs (VPLS, VPWS, L3VPN) on eNSP using Huawei Routers☆11Feb 11, 2020Updated 6 years ago
- ☆12Jun 28, 2024Updated last year
- LangReact 是一个配置化的 Planning Agent 应用开发工具,通过配置、插件,能快速为你的 GPT 应用提供 Planning 功能。☆12Apr 23, 2024Updated last year
- An SSH plugin for Dify☆13Jan 16, 2026Updated last month
- 🤖AI Agents for Financial Trading💰: LLM-Driven Stock Prediction & Investment Recommendation System☆13Apr 14, 2025Updated 10 months ago
- Java implementation for the Agent2Agent Protocol (A2A - https://github.com/google/A2A), enabling interaction between AI agents through a …☆11Apr 21, 2025Updated 10 months ago
- ☆28Jun 27, 2025Updated 8 months ago
- Documentation at☆14Mar 27, 2025Updated 11 months ago
- A distilled DeepSeek-R1 variant built on Qwen2.5-32B, fine-tuned with curated data for enhanced performance and efficiency. <metadata> gp…☆16Mar 11, 2025Updated last year
- A plugin for OpenCode. Make your coding agent learn and grow with every task.☆36Jan 31, 2026Updated last month
- 参考《上海交通大学生存手册》开源☆16Sep 25, 2024Updated last year
- ☆10Dec 29, 2023Updated 2 years ago
- Python Telegraph api.☆15Mar 22, 2025Updated 11 months ago
- A small framework to benchmark forecasting models via backtesting☆13Nov 25, 2023Updated 2 years ago
- AlphaGo Zero Reinforcement Learning Sokoban Solver☆11Jun 20, 2018Updated 7 years ago
- Optimizing Anytime Reasoning via Budget Relative Policy Optimization☆52Jul 15, 2025Updated 7 months ago
- This is a fork from Ryan Carson's AI Dev Tasks repository, with some code cleanup and refactoring to enable support for PostgreSQL databa…☆15Sep 8, 2025Updated 6 months ago
- Use the knowledge graph generated by GraphRAG as the external knowledge base for the Dify workflow.☆21Jun 4, 2025Updated 9 months ago
- A universal skills runtime framework SDK for building, deploying, and executing modular capabilities across diverse environments.☆27Mar 3, 2026Updated last week
- A multi-model AI council CLI that provides consensus-driven decisions using Claude, Codex, and Gemini.☆25Dec 10, 2025Updated 3 months ago