llm & rl
☆288Oct 24, 2025Updated 7 months ago
Alternatives and similar repositories for llm_rl
Users that are interested in llm_rl are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- MLLM @ Game☆17May 12, 2025Updated last year
- ☆17Jul 31, 2025Updated 10 months ago
- verl/HybridFlow: A Flexible and Efficient RL Post-Training Framework☆21,969Updated this week
- 现代人工智能中的数学基础☆51Dec 1, 2025Updated 6 months ago
- EasyR1: An Efficient, Scalable, Multi-Modality RL Training Framework based on veRL☆5,006Apr 6, 2026Updated 2 months ago
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- ☆145Sep 29, 2024Updated last year
- An Easy-to-use, Scalable and High-performance Agentic RL Framework based on Ray (PPO & DAPO & REINFORCE++ & VLM & TIS & vLLM & Ray & Asy…☆9,652Jun 9, 2026Updated last week
- Reproduce R1 Zero on Logic Puzzle☆2,451Mar 20, 2025Updated last year
- This repository contains the replication of the iGSM dataset generation process from the Physics of LLM paper by Zeyuan Zhu.☆17Sep 13, 2024Updated last year
- Medical Matting☆29Feb 21, 2023Updated 3 years ago
- 复现大模型相关算法及一些学习记录☆3,408Mar 21, 2026Updated 2 months ago
- ☆18Jul 10, 2024Updated last year
- A local search system implementation using Elasticsearch for Wikipedia data indexing and retrieval.☆14May 17, 2025Updated last year
- An reconstruction of RL Introduction and its course materials for a more efficient entry☆21Mar 4, 2026Updated 3 months ago
- Virtual machines for every use case on DigitalOcean • AdGet dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
- exFM - FM model with some useful extensions☆17Dec 20, 2024Updated last year
- 深度学习入门2:自制框架-随书代码☆40May 27, 2024Updated 2 years ago
- A small open source 3D agent simulator based on LLM.☆70Dec 1, 2024Updated last year
- ☆52Oct 20, 2025Updated 7 months ago
- Understanding R1-Zero-Like Training: A Critical Perspective☆1,261Aug 27, 2025Updated 9 months ago
- modern AI for beginners☆232Sep 9, 2025Updated 9 months ago
- A very simple GRPO implement for reproducing r1-like LLM thinking.☆1,690Nov 21, 2025Updated 6 months ago
- Scaling Deep Research via Reinforcement Learning in Real-world Environments.☆768May 10, 2026Updated last month
- ☆76May 22, 2025Updated last year
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- Fully open reproduction of DeepSeek-R1☆26,295Apr 2, 2026Updated 2 months ago
- Recommender-In-Detail is a package which offers detailed implementations of state-of-the-art techniques and basic methods in recommendati…☆19Sep 12, 2019Updated 6 years ago
- 在verl上做reward的定制开发☆178May 2, 2026Updated last month
- Official implementation of the paper: "A deeper look at depth pruning of LLMs"☆15Jul 24, 2024Updated last year
- Using the Qwen-2.5 model for text classification (lora)☆24May 7, 2025Updated last year
- Train transformer language models with reinforcement learning.☆18,613Jun 11, 2026Updated last week
- ☆21May 19, 2025Updated last year
- Train your Agent model via our easy and efficient framework☆1,763Dec 5, 2025Updated 6 months ago
- bilibili video course src code☆466Nov 14, 2023Updated 2 years ago
- Virtual machines for every use case on DigitalOcean • AdGet dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
- Official code and dataset for our EMNLP 2024 Findings paper: Stark: Social Long-Term Multi-Modal Conversation with Persona Commonsense Kn…☆19Dec 27, 2024Updated last year
- Train a 1B LLM with 1T tokens from scratch by personal☆807Apr 27, 2025Updated last year
- ☆21Apr 16, 2025Updated last year
- pytorch distribute tutorials☆176Jun 16, 2025Updated last year
- MM-Eureka V0 also called R1-Multimodal-Journey, Latest version is in MM-Eureka☆325Jun 21, 2025Updated 11 months ago
- [TMLR] Process Reward Models That Think☆89Nov 29, 2025Updated 6 months ago
- Distributed MoE in a Single Kernel [NeurIPS '25]☆266May 5, 2026Updated last month