waylandzhang / DeepSeek-RL-Qwen-0.5B-GRPO-gsm8kView external linksLinks
☆85Feb 3, 2025Updated last year
Alternatives and similar repositories for DeepSeek-RL-Qwen-0.5B-GRPO-gsm8k
Users that are interested in DeepSeek-RL-Qwen-0.5B-GRPO-gsm8k are comparing it to the libraries listed below
Sorting:
- 零实现 AlphaGo Zero☆17Nov 10, 2024Updated last year
- 训练自己的中文 Embedding 模型☆28Jan 6, 2025Updated last year
- ☆13Mar 16, 2025Updated 11 months ago
- 2024CCF国际AIOps挑战赛-赛道二(GLM4):基于检索增强的运维知识问答挑战赛解决方案分享。☆14Jul 5, 2024Updated last year
- (制作中)本项目旨在开发一个基于大语言模型(LLM)的对话游戏搭建框架,支持类似DND(龙与地下城)、狼人杀、文游等对话类游戏的快速设计和智能化NPC构建,增强对话类游戏的大模型响应体验。☆17Dec 2, 2025Updated 2 months ago
- A demonstration of how to train a custom tokenizer similar to TikToken.☆16Jan 6, 2025Updated last year
- ☆12Mar 28, 2025Updated 10 months ago
- 中文版hf-alignment-handbook,大模型全套sft、dpo、orpo、cpt训练教程.☆14Aug 25, 2024Updated last year
- 微调阿里开源的文字检测模型,利用合合识别返回的OCR结果作为初始训练数据,对模型进行优化训练,使其更加适应1万张图片的具体场景,提高文字识别的精度。☆10Dec 9, 2024Updated last year
- 使用Qwen1.5-0.5B-Chat模型进行通用信息抽取任务的微调,旨在: 验证生成式方法相较于抽取式NER的效果; 为新手提供简易的模型微调流程,尽量减少代码量; 大模型训练的数据格式处理。☆15Sep 6, 2024Updated last year
- ☆11Updated this week
- vllm混合推理扩展插件,支持多NUMA混合推理,单卡推理Qwen3-Next模型可达1000+ prefill☆31Nov 7, 2025Updated 3 months ago
- ☆22Jul 15, 2024Updated last year
- bilibili视频讲解所使用的课件代码记录☆26Sep 23, 2025Updated 4 months ago
- Gemma2(9B), Llama3-8B-Finetune-and-RAG, code base for sample, implemented in Kaggle platform☆22Feb 8, 2025Updated last year
- Revision of official yolov7-pose to support custom dataset for keypoint detection☆11Nov 12, 2023Updated 2 years ago
- Detecting car parking slot on Open car park space☆13Oct 21, 2019Updated 6 years ago
- ☆26Mar 21, 2024Updated last year
- A collection of papers about knowledge distillation in autonomous driving.☆29Mar 26, 2024Updated last year
- 大模型推理框架加速,让 LLM 飞起来☆24May 10, 2024Updated last year
- 《Reinforcement Learning》读书学习与视频分享笔记☆76Apr 1, 2025Updated 10 months ago
- ☆76Jan 24, 2025Updated last year
- Difyで作る生成AIアプリ完全入門☆17May 25, 2025Updated 8 months ago
- A simple WeChat Official Account layout tool based on Dify☆16Jun 27, 2025Updated 7 months ago
- ☆42Mar 6, 2025Updated 11 months ago
- Write the database metadata into the dify knowledge☆12Dec 30, 2025Updated last month
- HealthiVert-GAN, a novel deep-learning framework designed to generate pseudo-healthy vertebral images. These images simulate the pre-frac…☆11Nov 3, 2025Updated 3 months ago
- ☆28Dec 4, 2025Updated 2 months ago
- ☆17Feb 6, 2025Updated last year
- Use yolov5 to realize the road occupation operation and vehicle parking violation detection in urban streets, and can independently delin…☆12Jan 2, 2023Updated 3 years ago
- Workflow automation, but you just describe what you want and it happens.☆26Nov 22, 2025Updated 2 months ago
- The classic movies redux with machine learning using TensorFlow and Keras.☆11Feb 12, 2019Updated 7 years ago
- A full-stack AI-powered business intelligence tool for non-experts, featuring serverless backend processing and a secure Streamlit fronte…☆25Jan 6, 2026Updated last month
- 100 Production-Ready Claude Code Skills - The most comprehensive collection of AI skills for sales, business automation, content creation…☆35Oct 22, 2025Updated 3 months ago
- ☆11Aug 29, 2025Updated 5 months ago
- 博客信息☆42Feb 11, 2026Updated last week
- A simple example for PySpark based project.☆11Jun 3, 2016Updated 9 years ago
- This is a A/B test project from Udacity.☆12Dec 24, 2019Updated 6 years ago
- Python Telegraph api.☆15Mar 22, 2025Updated 10 months ago