sii-research / siiRLLinks
siiRL: Shanghai Innovation Institute RL Framework for Advanced LLMs and Multi-Agent Systems
☆327Updated this week
Alternatives and similar repositories for siiRL
Users that are interested in siiRL are comparing it to the libraries listed below
Sorting:
- Implementation for FP8/INT8 Rollout for RL training without performence drop.☆281Updated 2 months ago
- ☆128Updated last month
- Training VLM agents with multi-turn reinforcement learning☆365Updated last week
- Super-Efficient RLHF Training of LLMs with Parameter Reallocation☆330Updated 8 months ago
- Official PyTorch implementation of the paper "dLLM-Cache: Accelerating Diffusion Large Language Models with Adaptive Caching" (dLLM-Cache…☆191Updated last month
- Rethinking RL Scaling for Vision Language Models: A Transparent, From-Scratch Framework and Comprehensive Evaluation Scheme☆146Updated 9 months ago
- TraceRL & TraDo-8B: Revolutionizing Reinforcement Learning Framework for Diffusion Large Language Models☆380Updated 3 weeks ago
- A lightweight reinforcement learning framework that integrates seamlessly into your codebase, empowering developers to focus on algorithm…☆96Updated 4 months ago
- ☆185Updated last week
- Cosmos-RL is a flexible and scalable Reinforcement Learning framework specialized for Physical AI applications.☆257Updated last week
- MiroRL is an MCP-first reinforcement learning framework for deep research agent.☆186Updated 4 months ago
- Official repository for "RLVR-World: Training World Models with Reinforcement Learning" (NeurIPS 2025), https://arxiv.org/abs/2505.13934☆177Updated 2 months ago
- Exploring the Limit of Outcome Reward for Learning Mathematical Reasoning☆190Updated 9 months ago
- 青稞Talk☆181Updated this week
- MiroMind-M1 is a fully open-source series of reasoning language models built on Qwen-2.5, focused on advancing mathematical reasoning.☆247Updated 4 months ago
- End-to-End Reinforcement Learning for Multi-Turn Tool-Integrated Reasoning☆347Updated 3 months ago
- Towards a Unified View of Large Language Model Post-Training☆199Updated 4 months ago
- Official implementation of "Fast-dLLM: Training-free Acceleration of Diffusion LLM by Enabling KV Cache and Parallel Decoding"☆764Updated last month
- ☆126Updated 7 months ago
- A set of examples based on verl for end-to-end RL training recipes.☆108Updated this week
- ☆109Updated 3 months ago
- ☆208Updated 2 months ago
- Discrete Diffusion Forcing (D2F): dLLMs Can Do Faster-Than-AR Inference☆224Updated 3 months ago
- ☆118Updated 9 months ago
- ☆213Updated last month
- repo for paper https://arxiv.org/abs/2504.13837☆310Updated 3 weeks ago
- ☆97Updated last month
- [ASPLOS'26] Taming the Long-Tail: Efficient Reasoning RL Training with Adaptive Drafter☆121Updated last month
- VLA-Arena is an open-source benchmark for systematic evaluation of Vision-Language-Action (VLA) models.☆88Updated last week
- The official repo of One RL to See Them All: Visual Triple Unified Reinforcement Learning☆329Updated 7 months ago