sii-research / siiRLLinks
siiRL: Shanghai Innovation Institute RL Framework for Advanced LLMs and Multi-Agent Systems
☆309Updated this week
Alternatives and similar repositories for siiRL
Users that are interested in siiRL are comparing it to the libraries listed below
Sorting:
- Implementation for FP8/INT8 Rollout for RL training without performence drop.☆280Updated last month
- Super-Efficient RLHF Training of LLMs with Parameter Reallocation☆328Updated 7 months ago
- A lightweight reinforcement learning framework that integrates seamlessly into your codebase, empowering developers to focus on algorithm…☆92Updated 3 months ago
- Rethinking RL Scaling for Vision Language Models: A Transparent, From-Scratch Framework and Comprehensive Evaluation Scheme☆146Updated 8 months ago
- Official PyTorch implementation of the paper "dLLM-Cache: Accelerating Diffusion Large Language Models with Adaptive Caching" (dLLM-Cache…☆186Updated last month
- Exploring the Limit of Outcome Reward for Learning Mathematical Reasoning☆190Updated 8 months ago
- Training VLM agents with multi-turn reinforcement learning☆342Updated 2 weeks ago
- Towards a Unified View of Large Language Model Post-Training☆192Updated 3 months ago
- MiroRL is an MCP-first reinforcement learning framework for deep research agent.☆183Updated 3 months ago
- ☆121Updated 3 weeks ago
- ☆107Updated 3 months ago
- End-to-End Reinforcement Learning for Multi-Turn Tool-Integrated Reasoning☆337Updated 2 months ago
- The official repo of One RL to See Them All: Visual Triple Unified Reinforcement Learning☆329Updated 6 months ago
- Cosmos-RL is a flexible and scalable Reinforcement Learning framework specialized for Physical AI applications.☆238Updated this week
- repo for paper https://arxiv.org/abs/2504.13837☆288Updated 5 months ago
- MiroMind-M1 is a fully open-source series of reasoning language models built on Qwen-2.5, focused on advancing mathematical reasoning.☆245Updated 4 months ago
- ☆208Updated last month
- Official implementation of "Fast-dLLM: Training-free Acceleration of Diffusion LLM by Enabling KV Cache and Parallel Decoding"☆736Updated 2 weeks ago
- 青稞Talk☆175Updated last week
- ☆124Updated 6 months ago
- SPIRAL: Self-Play on Zero-Sum Games Incentivizes Reasoning via Multi-Agent Multi-Turn Reinforcement Learning☆166Updated 3 months ago
- Official repository for "RLVR-World: Training World Models with Reinforcement Learning" (NeurIPS 2025), https://arxiv.org/abs/2505.13934☆158Updated last month
- TraceRL & TraDo-8B: Revolutionizing Reinforcement Learning Framework for Diffusion Large Language Models☆363Updated this week
- ☆319Updated 6 months ago
- [ICLR 2025] COAT: Compressing Optimizer States and Activation for Memory-Efficient FP8 Training☆251Updated 4 months ago
- Research Code for preprint "Optimizing Test-Time Compute via Meta Reinforcement Finetuning".☆115Updated 4 months ago
- Async pipelined version of Verl☆125Updated 8 months ago
- A Telegram bot to recommend arXiv papers☆289Updated last month
- An Easy-to-use, Scalable and High-performance RLHF Framework designed for Multimodal Models.☆149Updated 2 months ago
- PaCoRe: Learning to Scale Test-Time Compute with Parallel Coordinated Reasoning☆177Updated last week