anakin87/qwen-scheduler-grpo

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/anakin87/qwen-scheduler-grpo)

anakin87 / qwen-scheduler-grpo

Train a Language Model with GRPO to create a schedule from a list of events and priorities

☆272

Alternatives and similar repositories for qwen-scheduler-grpo

Users that are interested in qwen-scheduler-grpo are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

owenliang / qwen2.5-0.5b-grpo
View on GitHub
Qwen2.5 0.5B GRPO
☆86Feb 16, 2025Updated last year
anakin87 / who-killed-laura-palmer
View on GitHub
Simple Question Answering system, based on data crawled from Twin Peaks Wiki. It is built using 🔍 Haystack, an awesome open-source frame…
☆11Jun 22, 2023Updated 3 years ago
kossisoroyce / train_grpo.py
View on GitHub
GRPO Training Script for Qwen Model on GSM8K Dataset. This script trains a Qwen model using the GRPO (Generalized Reinforcement Policy Op…
☆33Dec 11, 2025Updated 7 months ago
liunian-Jay / AgenticRAG-RL
View on GitHub
A minimal implementation of Agentic RAG using GRPO
☆17Jun 11, 2025Updated last year
liuchen6667 / qwen_grpo_gsm8k
View on GitHub
简单易理解的代码，用于在qwen上使用grpo加强数学能力
☆58May 14, 2025Updated last year
GPUs on demand by Runpod - Special Offer Available • Ad
Run AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
ninehills / countdown
View on GitHub
Countdown Game Distill&RL
☆48Sep 5, 2025Updated 10 months ago
dejan94it / cc_Rtools
View on GitHub
This plugin allows the Cheshire Cat to use tools written in R language
☆10Dec 23, 2024Updated last year
Paul33333 / Agentic_RAG
View on GitHub
Local DeepSearch (Advantage: Low Threshold): an implementation of Agentic RAG based on DeepSeek-R1 API and Tavily API
☆17Jun 21, 2025Updated last year
Simple-Efficient / RL-Factory
View on GitHub
Train your Agent model via our easy and efficient framework
☆1,773Dec 5, 2025Updated 7 months ago
tyler-romero / microR1
View on GitHub
Simple repository for training small reasoning models
☆51Feb 17, 2026Updated 5 months ago
lsdefine / simple_GRPO
View on GitHub
A very simple GRPO implement for reproducing r1-like LLM thinking.
☆1,698Nov 21, 2025Updated 8 months ago
foreveryh / langgraph-deep-research
View on GitHub
☆248Jun 6, 2025Updated last year
kyegomez / SelfExtend
View on GitHub
Implementation of SelfExtend from the paper "LLM Maybe LongLM: Self-Extend LLM Context Window Without Tuning" from Pytorch and Zeta
☆13Nov 11, 2024Updated last year
qiufengqijun / mini_qwen
View on GitHub
这是一个从头训练大语言模型的项目，包括预训练、微调和直接偏好优化，模型拥有1B参数，支持中英文。
☆863Feb 18, 2025Updated last year
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
hiyouga / EasyR1
View on GitHub
EasyR1: An Efficient, Scalable, Multi-Modality RL Training Framework based on veRL
☆5,081Updated this week
dhcode-cpp / X-R1
View on GitHub
minimal-cost for training 0.5B R1-Zero
☆816May 14, 2025Updated last year
robertpiosik / CodeWebChat
View on GitHub
Free AI coding with static context
☆1,386Updated this week
PeterGriffinJin / Search-R1
View on GitHub
Search-R1: An Efficient, Scalable RL Training Framework for Reasoning & Search Engine Calling interleaved LLM based on veRL
☆5,153Nov 13, 2025Updated 8 months ago
verl-project / verl
View on GitHub
verl/HybridFlow: A Flexible and Efficient RL Post-Training Framework
☆22,654Updated this week
LizabethLi / markdown-to-wechat-converter
View on GitHub
☆48Jun 3, 2026Updated last month
NJUxlj / Travel-Agent-based-on-Qwen2-RLHF
View on GitHub
A travel agent based on Qwen2.5, fine-tuned by SFT + DPO/PPO/GRPO using traveling question-answer dataset, a mindmap can be output using …
☆81Jul 6, 2026Updated 2 weeks ago
hhaoyan / mbtr
View on GitHub
A code for calculating MBTR molecule/crystal structure representation. (https://doi.org/10.1088/2632-2153/aca005)
☆14Nov 15, 2022Updated 3 years ago
AaronFeng753 / Better-Qwen3
View on GitHub
Auto Thinking Mode switch for Qwen3 in Open webui
☆72May 8, 2025Updated last year
Serverless GPU API endpoints on Runpod - Get Bonus Credits • Ad
Skip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
ninehills / embedding_finetuning
View on GitHub
Fine-tuning embedding models.
☆14Nov 25, 2024Updated last year
ruan11223344 / McpDocServer
View on GitHub
一个基于MCP协议的开发文档服务器，专为各类开发框架文档设计
☆48Mar 31, 2025Updated last year
Agent-RL / ReCall
View on GitHub
ReSearch: Learning to Reason with Search for LLMs via Reinforcement Learning & ReCall: Learning to Reason with Tool Call for LLMs via Rei…
☆1,421May 16, 2025Updated last year
Wencho8 / ReAct-AI-Agent-from-Scratch-using-DeepSeek
View on GitHub
ReAct AI Agent from Scratch using DeepSeek: Handling Memory & Tools without Frameworks
☆40Feb 18, 2025Updated last year
joeseesun / mcp-prompt-server
View on GitHub
这是一个基于Model Context Protocol (MCP)的服务器，用于根据用户任务需求提供预设的prompt模板，帮助Cline/Cursor/Windsurf...更高效地执行各种任务。服务器将预设的prompt作为工具(tools)返回，以便在Cursor和…
☆649May 20, 2025Updated last year
shangshang-wang / Tina
View on GitHub
[ICLR 2026] Tina: Tiny Reasoning Models via LoRA
☆338Sep 23, 2025Updated 10 months ago
OpenRLHF / OpenRLHF
View on GitHub
An Easy-to-use, Scalable and High-performance Agentic RL Framework based on Ray (PPO & DAPO & REINFORCE++ & VLM & TIS & vLLM & Ray & Asy…
☆9,848Jul 14, 2026Updated last week
gauss5930 / iDUS
View on GitHub
An unofficial implementation of SOLAR-10.7B model and the newly proposed interlocked-DUS(iDUS) implementation and experiment details.
☆14Mar 20, 2024Updated 2 years ago
and270 / thinking_effort_processor
View on GitHub
☆93Jul 7, 2025Updated last year
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
wyf3 / llm_related
View on GitHub
复现大模型相关算法及一些学习记录
☆3,466Jul 2, 2026Updated 3 weeks ago
OpenPipe / ART
View on GitHub
Agent Reinforcement Trainer: train multi-step agents for real-world tasks using GRPO. Give your agents on-the-job training. Reinforcement…
☆10,529Updated this week
waltonfuture / RL-with-Cold-Start
View on GitHub
SFT+RL boosts multimodal reasoning
☆47Jun 27, 2025Updated last year
foreveryh / mentis
View on GitHub
Mentis: A powerful multi-agent orchestration framework built on LangGraph.
☆299May 16, 2025Updated last year
ianhohoho / auto-hyde
View on GitHub
🔎 A deep-dive into HyDE for Advanced LLM RAG + 💡 Introducing AutoHyDE, a semi-supervised framework to improve the effectiveness, covera…
☆38Mar 26, 2024Updated 2 years ago
KRLabsOrg / LettuceDetect
View on GitHub
Span-level grounding verification for RAG, code, and tool-grounded AI outputs.
☆588Updated this week
weise25 / LocalSite-ai
View on GitHub
Generate Web Pages and Components with text prompts, with Local Models. (or Cloud Models, if you want)
☆404Jun 18, 2026Updated last month