这是一个open-r1的复现项目,对0.5B、1.5B、3B、7B的qwen模型进行GRPO训练,观察到一些有趣的现象。
☆62Apr 13, 2025Updated last year
Alternatives and similar repositories for open-r1-reprod
Users that are interested in open-r1-reprod are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Load balancing based on reinforcement learning.☆11Oct 11, 2020Updated 5 years ago
- A mobile robot network routing protocol based on multi-agent reinforcement learning.☆15Jun 28, 2021Updated 4 years ago
- ☆11Feb 23, 2023Updated 3 years ago
- Optimizing Anytime Reasoning via Budget Relative Policy Optimization☆55Jul 15, 2025Updated 10 months ago
- The code for pg2021 paper "Line Art Colorization Based on Explicit Region Segmentation"☆12Feb 7, 2022Updated 4 years ago
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- This repo is used to train and run OCR model which is based on original CRNN and change it's backbone to the ResNet34.☆10Jan 15, 2021Updated 5 years ago
- 这是一个从头训练大语言模型的项目,包括预训练、微调和直接偏好优化,模型拥有1B参数,支持中英文。☆841Feb 18, 2025Updated last year
- Multi objective optimization-based routing algorithm for SDN networks☆20Aug 31, 2020Updated 5 years ago
- [ICLR 2026] Do Not Let Low-Probability Tokens Over-Dominate in RL for LLMs☆45May 20, 2025Updated last year
- ☆14May 9, 2025Updated last year
- 收集整理大模型面试题☆12Aug 29, 2024Updated last year
- Repository containing code for our paper Y. Zhang, D. Jiang, F. Shao, T. Wu, X. Liang and J. Chen, "Metaheuristic-Based Beam Scheduling S…☆27Apr 16, 2025Updated last year
- ☆13May 16, 2025Updated last year
- Repository for the Findings of ACL'23 paper Label Agnostic Pre-training for Zero-shot Text Classification☆12Aug 10, 2023Updated 2 years ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- Train a tiny LLaMA model from scratch to repeat your words using Reinforcement Learning from Human Feedback (RLHF)☆18May 23, 2024Updated last year
- The first large scale formally verified reasoning dataset for Verilog☆21May 16, 2025Updated last year
- ☆105Jul 24, 2025Updated 9 months ago
- Code for ACM MM 2024 paper "A Picture Is Worth a Graph: A Blueprint Debate Paradigm for Multimodal Reasoning"☆19Dec 5, 2024Updated last year
- This repository is associated with the research paper titled ImageChain: Advancing Sequential Image-to-Text Reasoning in Multimodal Large…☆15Jun 4, 2025Updated 11 months ago
- [IROS 2024 Oral Pitch] PyTorch Implementation of "Dual-Branch Graph Transformer Network for 3D Human Mesh Reconstruction from Video"☆15Jul 19, 2024Updated last year
- ☆15Apr 4, 2025Updated last year
- ☆15Mar 30, 2025Updated last year
- Greedy Perimeter Stateless Routing (GPSR) implement on NS3 platform☆21Aug 17, 2017Updated 8 years ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- Resources and paper list for 'Scaling Environments for Agents'. This repository accompanies our survey on how environments contribute to …☆67Jan 28, 2026Updated 3 months ago
- Packet Routing Simulator for Multi-Agent Reinforcement Learning☆27Jul 23, 2024Updated last year
- ☆15Aug 7, 2025Updated 9 months ago
- Framwork for the work "Large Language Models for Zero Touch Network Configuration Management"☆13Jun 20, 2024Updated last year
- A simple utility for doing RISC-V HPM perf monitoring.☆18May 8, 2017Updated 9 years ago
- Collection of datasets for network research.☆14Jul 26, 2020Updated 5 years ago
- [ICLR 2025] Weighted-Reward Preference Optimization for Implicit Model Fusion☆14Mar 17, 2025Updated last year
- ☆10Oct 30, 2023Updated 2 years ago
- 清华大学电子系--大一下小学期python大作业--一个很简陋的基于的机器学习的人脸识别系统☆10Sep 2, 2022Updated 3 years ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- MPLS VPNs (VPLS, VPWS, L3VPN) on eNSP using Huawei Routers☆11Feb 11, 2020Updated 6 years ago
- 基于pytorch实现的图片分类模型训练框架,各个部分模块化,方便修改模型。包含分类模型、训练、验证、测试、剪枝再训练、可视化、onnx导出、onnx推理。☆17Nov 23, 2025Updated 5 months ago
- Code for "Positional Diffusion: Ordering Unordered Sets with Diffusion Probabilistic Models"☆18Mar 21, 2023Updated 3 years ago
- An LLM Mock Server that supports simulating the protocols of all LLM providers.☆14Oct 18, 2025Updated 7 months ago
- ☆13Jul 5, 2024Updated last year
- AI 应用示例合集☆114Jun 3, 2024Updated last year
- Code for experimenting with load-balancing intradomain traffic engineering using GNNs and RL. Project as part of masters degree at the Un…☆38Jan 12, 2021Updated 5 years ago