超简单复现Deepseek-R1-Zero和Deepseek-R1,以「24点游戏」为例。通过zero-RL、SFT以及SFT+RL,以激发LLM的自主验证反思能力。 About Clean, minimal, accessible reproduction of DeepSeek R1-Zero, DeepSeek R1
☆34Apr 5, 2025Updated 11 months ago
Alternatives and similar repositories for 24-Game-Reasoning
Users that are interested in 24-Game-Reasoning are comparing it to the libraries listed below
Sorting:
- Support finetuning GLM4v with zero2☆16Jun 29, 2024Updated last year
- ☆11Feb 25, 2026Updated last week
- MCP DeepResearch Server: 基于 LangGraph + Ollama + Tavily 的深度研究服务器,支持异步运行、超时控制与进度推送☆31Jun 16, 2025Updated 8 months ago
- Long CoT Fine-Tuning and Reinforcement Learning for LLMs in the Context of the 24-Point Game: A Toy Project☆25Feb 22, 2025Updated last year
- A fluent, scalable, and easy-to-use LLM data processing framework.☆28Jan 31, 2026Updated last month
- A fast, local, and secure approach for training LLMs for coding tasks using GRPO with WebAssembly and interpreter feedback.☆41Apr 4, 2025Updated 11 months ago
- diagnosis_zero, R1 Zero reproduce on disease diagnosis☆34Jul 24, 2025Updated 7 months ago
- ☆23Updated this week
- Context-central multi-agent framework with PyTorch-like API. Build intelligent agent systems with minimal code.☆73Oct 26, 2025Updated 4 months ago
- Difyで作る生成AIアプリ完全入門☆17May 25, 2025Updated 9 months ago
- The Good, The Bad, and The Greedy: Evaluation of LLMs Should Not Ignore Non-Determinism☆30Jul 17, 2024Updated last year
- A simple WeChat Official Account layout tool based on Dify☆17Jun 27, 2025Updated 8 months ago
- ☆39Aug 1, 2025Updated 7 months ago
- ☆41Apr 30, 2025Updated 10 months ago
- an auto coder which automatically fixes errors and improves the code from simple user prompt☆37Dec 27, 2024Updated last year
- ☆35Jul 8, 2025Updated 7 months ago
- Workflow automation, but you just describe what you want and it happens.☆27Nov 22, 2025Updated 3 months ago
- 参考《上海交通大学生存手册》开源☆16Sep 25, 2024Updated last year
- ☆11Aug 29, 2025Updated 6 months ago
- A full-stack AI-powered business intelligence tool for non-experts, featuring serverless backend processing and a secure Streamlit fronte…☆28Feb 13, 2026Updated 3 weeks ago
- ☆28Dec 4, 2025Updated 3 months ago
- Write the database metadata into the dify knowledge☆12Dec 30, 2025Updated 2 months ago
- ☆65Jul 10, 2025Updated 7 months ago
- 这是一个open-r1的复现项目,对0.5B、1.5B、3B、7B的qwen模型进行GRPO训练,观察到一些有趣的现象。☆56Apr 13, 2025Updated 10 months ago
- 本项目主要介绍prompt工程相关用例。包括模拟智能推荐客服系统构建和问答、思维链、自洽性、思维树等相关进阶demo,旨在帮助大家理解prompt。通过一份代码实现了同时支持多种大模型(如OpenAI、阿里通义千问等)并使用FastAPI对应用进行API封装。☆52Sep 26, 2024Updated last year
- TOD-Flow: Modeling the Structure of Task-Oriented Dialogues☆13Feb 7, 2024Updated 2 years ago
- This is the repository of the EnviroDetaNet☆13Sep 3, 2024Updated last year
- AdaptiveStep: Automatically Dividing Reasoning Step through Model Confidence☆10Mar 2, 2025Updated last year
- A collection of some awesome public projects about LLM-based Web Agents and Tools.☆12Apr 25, 2024Updated last year
- Python Telegraph api.☆15Mar 22, 2025Updated 11 months ago
- 基于ReAct构建的电商智能客服代理☆48Sep 19, 2024Updated last year
- ☆28Jun 27, 2025Updated 8 months ago
- New York Times Scraper☆11Feb 19, 2024Updated 2 years ago
- ☆10Dec 29, 2023Updated 2 years ago
- A distilled DeepSeek-R1 variant built on Qwen2.5-32B, fine-tuned with curated data for enhanced performance and efficiency. <metadata> gp…☆16Mar 11, 2025Updated 11 months ago
- This is a fork from Ryan Carson's AI Dev Tasks repository, with some code cleanup and refactoring to enable support for PostgreSQL databa…☆15Sep 8, 2025Updated 5 months ago
- A universal skills runtime framework SDK for building, deploying, and executing modular capabilities across diverse environments.☆27Updated this week
- Automatically generates captions for an image using Image processing and NLP. Model was trained on Flickr30K dataset.☆11Jun 11, 2020Updated 5 years ago
- Java implementation for the Agent2Agent Protocol (A2A - https://github.com/google/A2A), enabling interaction between AI agents through a …☆11Apr 21, 2025Updated 10 months ago