☆85Feb 3, 2025Updated last year
Alternatives and similar repositories for DeepSeek-RL-Qwen-0.5B-GRPO-gsm8k
Users that are interested in DeepSeek-RL-Qwen-0.5B-GRPO-gsm8k are comparing it to the libraries listed below
Sorting:
- 零实现 AlphaGo Zero☆17Nov 10, 2024Updated last year
- 训练自己的中文 Embedding 模型☆28Jan 6, 2025Updated last year
- ☆13Mar 16, 2025Updated 11 months ago
- 2024CCF国际AIOps挑战赛-赛道二(GLM4):基于检索增强的运维知识问答挑战赛解决方案分享。☆14Jul 5, 2024Updated last year
- A demonstration of how to train a custom tokenizer similar to TikToken.☆15Jan 6, 2025Updated last year
- (制作中)本项目旨在开发一个基于大语言模型(LLM)的对话游戏搭建框架,支持类似DND(龙与地下城)、狼人杀、文游等对话类游戏的快速设计和智能化NPC构建,增强对话类游戏的大模型响应体验。☆17Dec 2, 2025Updated 3 months ago
- 中文版hf-alignment-handbook,大模型全套sft、dpo、orpo、cpt训练教程.☆14Aug 25, 2024Updated last year
- 微调阿里开源的文字检测模型,利用合合识别返回的OCR结果作为初始训练数据,对模型进行优化训练,使其更加适应1万张图片的具体场景,提高文字识别的精度。☆10Dec 9, 2024Updated last year
- ☆12Mar 28, 2025Updated 11 months ago
- 使用Qwen1.5-0.5B-Chat模型进行通用信息抽取任务的微调,旨在: 验证生成式方法相较于抽取式NER的效果; 为新手提供简易的模型微调流程,尽量减少代码量; 大模型训练的数据格式处理。☆15Sep 6, 2024Updated last year
- ☆22Jul 15, 2024Updated last year
- GRPO Training Script for Qwen Model on GSM8K Dataset. This script trains a Qwen model using the GRPO (Generalized Reinforcement Policy Op…☆28Dec 11, 2025Updated 2 months ago
- vllm混合推理扩展插件,支持多NUMA混合推理,单卡推理Qwen3-Next模型可达1000+ prefill☆31Nov 7, 2025Updated 4 months ago
- ☆11Updated this week
- bilibili视频讲解所 使用的课件代码记录☆26Mar 3, 2026Updated last week
- Gemma2(9B), Llama3-8B-Finetune-and-RAG, code base for sample, implemented in Kaggle platform☆22Feb 8, 2025Updated last year
- Revision of official yolov7-pose to support custom dataset for keypoint detection☆11Nov 12, 2023Updated 2 years ago
- An extended project of the LLM Compiler paper, focusing on developing LLM-based Autonomous Agents.☆26Oct 22, 2024Updated last year
- ☆26Mar 21, 2024Updated last year
- Detecting car parking slot on Open car park space☆13Oct 21, 2019Updated 6 years ago
- 大模型推理框架加速,让 LLM 飞起来☆24May 10, 2024Updated last year
- 《Reinforcement Learning》读书学习与视频分享笔记☆78Apr 1, 2025Updated 11 months ago
- ☆76Jan 24, 2025Updated last year
- ☆42Mar 6, 2025Updated last year
- Difyで作る生成AIアプリ完全入門☆17May 25, 2025Updated 9 months ago
- ☆26Feb 28, 2026Updated last week
- A simple WeChat Official Account layout tool based on Dify☆17Jun 27, 2025Updated 8 months ago
- A full-stack AI-powered business intelligence tool for non-experts, featuring serverless backend processing and a secure Streamlit fronte…☆28Feb 13, 2026Updated 3 weeks ago
- ☆28Dec 4, 2025Updated 3 months ago
- Use yolov5 to realize the road occupation operation and vehicle parking violation detection in urban streets, and can independently delin…☆12Jan 2, 2023Updated 3 years ago
- The classic movies redux with machine learning using TensorFlow and Keras.☆11Feb 12, 2019Updated 7 years ago
- Write the database metadata into the dify knowledge☆12Dec 30, 2025Updated 2 months ago
- ☆11Aug 29, 2025Updated 6 months ago
- HealthiVert-GAN, a novel deep-learning framework designed to generate pseudo-healthy vertebral images. These images simulate the pre-frac…☆11Nov 3, 2025Updated 4 months ago
- ☆17Feb 6, 2025Updated last year
- Workflow automation, but you just describe what you want and it happens.☆27Nov 22, 2025Updated 3 months ago
- 博客信息☆42Mar 3, 2026Updated last week
- This is a fork from Ryan Carson's AI Dev Tasks repository, with some code cleanup and refactoring to enable support for PostgreSQL databa…☆15Sep 8, 2025Updated 6 months ago
- ☆14Jun 15, 2023Updated 2 years ago