☆85Feb 3, 2025Updated last year
Alternatives and similar repositories for DeepSeek-RL-Qwen-0.5B-GRPO-gsm8k
Users that are interested in DeepSeek-RL-Qwen-0.5B-GRPO-gsm8k are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- GRPO Training Script for Qwen Model on GSM8K Dataset. This script trains a Qwen model using the GRPO (Generalized Reinforcement Policy Op…☆32Dec 11, 2025Updated 6 months ago
- 汇编语言学习的例子☆10Aug 5, 2021Updated 4 years ago
- ☆12Dec 22, 2025Updated 5 months ago
- Created a simple neural network using C++17 standard and the Eigen library that supports both forward and backward propagation.☆11Jul 27, 2024Updated last year
- Dataset corresponding to the paper: "Form2Seq : A Framework for Higher-Order Form Structure Extraction"☆10Feb 17, 2021Updated 5 years ago
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- This repo is the artifact of FUEL☆16May 19, 2026Updated 3 weeks ago
- (NBCE)Naive Bayes-based Context Extension on ChatGLM-6b☆15Jun 7, 2023Updated 3 years ago
- [ACL'24 Findings] Official code for "TLCR: Token-Level Continuous Reward for Fine-grained Reinforcement Learning from Human Feedback"☆12Dec 6, 2024Updated last year
- ☆12Feb 28, 2025Updated last year
- ☆26Mar 21, 2024Updated 2 years ago
- [AAAI 2026] Multimodal Deepresearcher: Generating Text-Chart Interleaved Reports From Scratch with Agentic Framework☆55Jun 8, 2026Updated last week
- ☆28Sep 15, 2025Updated 8 months ago
- 补充了一些Visualglm缺少的文件,可以对Visualglm进行训练,实例中是对人脸做了面相的识别☆13Jun 7, 2023Updated 3 years ago
- 放弃幻想、时刻准备、随时面试☆14Dec 17, 2025Updated 5 months ago
- AI Agents on DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- Load Tensorflow pb file using Bert/TextCNNs, an ensemble model using Java.☆10Aug 20, 2021Updated 4 years ago
- 2024CCF国际AIOps挑战赛-赛道二(GLM4):基于检索增强的运维知识问答挑战赛解决方案分享。☆14Jul 5, 2024Updated last year
- Code for paper 'Are We Falling in a Middle-Intelligence Trap? An Analysis and Mitigation of the Reversal Curse'☆14Aug 2, 2024Updated last year
- The code implementation of MuScleLoRA (Accepted in ACL 2024)☆10Dec 1, 2024Updated last year
- [CIKM 2025] LLMAEL: Large Language Models are Good Context Augmenters for Entity Linking☆18Sep 6, 2025Updated 9 months ago
- Courses in UCAS☆14Jun 12, 2023Updated 3 years ago
- 使用Qwen1.5-0.5B-Chat模型进行通用信息抽取任务的微调,旨在: 验证生成式方法相较于抽取式NER的效果; 为新手提供简易的模型微调流程,尽量减少代码量; 大模型训练的数据格式处理。☆14Sep 6, 2024Updated last year
- a demo for how to execute bert_base_chinese based model in java☆10Mar 8, 2019Updated 7 years ago
- Using Siamese LSTM to classify repeated quora questions. Attempted pretrained bert embeddings, Word2Vec and training own embeddings toget…☆10Aug 28, 2020Updated 5 years ago
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- 中文纠错-使用拼音树及编辑距离☆13Jul 19, 2019Updated 6 years ago
- Official repository for ToolScope: An Agentic Framework for Vision-Guided and Long-Horizon Tool Use☆30Nov 4, 2025Updated 7 months ago
- 天池比赛【NLP】医学搜索Query相关性判断 第三名方案☆36Mar 11, 2023Updated 3 years ago
- ☆42Jun 11, 2025Updated last year
- Official repo for "ProSec: Fortifying Code LLMs with Proactive Security Alignment"☆17Feb 26, 2026Updated 3 months ago
- Backdooring Neural Code Search☆14Sep 8, 2023Updated 2 years ago
- This is the repository for our paper: Untying the Reversal Curse via Bidirectional Language Model Editing☆11May 25, 2025Updated last year
- Simple tool to extract icons from a pe file and other useful information☆13Jun 22, 2018Updated 7 years ago
- This is a repo for showcasing using MCTS with LLMs to solve gsm8k problems☆95Nov 13, 2025Updated 7 months ago
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- ✨ Ceynri's personal website.☆17Jun 21, 2025Updated 11 months ago
- ☆22Jul 15, 2024Updated last year
- Benchmark, Toolbox, and Reflection-based Method for Clinical Agent☆22Nov 6, 2024Updated last year
- ☆33May 9, 2025Updated last year
- Code of paper "A Video Dataset for Falling Object Detection around Buildings" https://arxiv.org/abs/2408.05750☆19Jul 10, 2025Updated 11 months ago
- Parses a document (scanned or phone captured) and returns the underlying question - answer layout structured capture by LayoutXLM model☆10Jun 14, 2021Updated 5 years ago
- A simple tool which can automatically generate unity3d animator controller.☆14Jan 22, 2016Updated 10 years ago