NVlabs / Long-RLLinks
Long-RL: Scaling RL to Long Sequences
☆605Updated last week
Alternatives and similar repositories for Long-RL
Users that are interested in Long-RL are comparing it to the libraries listed below
Sorting:
- [ICLR 2025] VILA-U: a Unified Foundation Model Integrating Visual Understanding and Generation☆383Updated 4 months ago
- Code for MetaMorph Multimodal Understanding and Generation via Instruction Tuning☆209Updated 4 months ago
- ☆218Updated last week
- [Survey] Next Token Prediction Towards Multimodal Intelligence: A Comprehensive Survey☆447Updated 7 months ago
- Official implementation of UnifiedReward & UnifiedReward-Think☆531Updated last week
- Official Code for "Mini-o3: Scaling Up Reasoning Patterns and Interaction Turns for Visual Search"☆186Updated this week
- EVE Series: Encoder-Free Vision-Language Models from BAAI☆349Updated last month
- Video-R1: Reinforcing Video Reasoning in MLLMs [🔥the first paper to explore R1 for video]☆685Updated last week
- Official repository of "GoT: Unleashing Reasoning Capability of Multimodal Large Language Model for Visual Generation and Editing"☆283Updated 4 months ago
- Long Context Transfer from Language to Vision☆392Updated 5 months ago
- [CVPR 2025] 🔥 Official impl. of "TokenFlow: Unified Image Tokenizer for Multimodal Understanding and Generation".☆379Updated last month
- Visual Planning: Let's Think Only with Images☆270Updated 3 months ago
- ☆530Updated last week
- The official repository for our paper, "Open Vision Reasoner: Transferring Linguistic Cognitive Behavior for Visual Reasoning".☆138Updated last week
- [ICLR 2024 Spotlight] DreamLLM: Synergistic Multimodal Comprehension and Creation☆456Updated 9 months ago
- Pytorch implementation for the paper titled "SimpleAR: Pushing the Frontier of Autoregressive Visual Generation"☆400Updated 2 months ago
- The official repo of One RL to See Them All: Visual Triple Unified Reinforcement Learning☆312Updated 3 months ago
- OpenThinkIMG is an end-to-end open-source framework that empowers LVLMs to think with images.☆299Updated 3 months ago
- Selftok: Discrete Visual Tokens of Autoregression, by Diffusion, and for Reasoning☆210Updated 3 months ago
- Official implementation of the Law of Vision Representation in MLLMs☆165Updated 9 months ago
- [ICLR 2025] AuroraCap: Efficient, Performant Video Detailed Captioning and a New Benchmark☆127Updated 3 months ago
- Machine Mental Imagery: Empower Multimodal Reasoning with Latent Visual Tokens (arXiv 2025)☆152Updated last month
- 📖 This is a repository for organizing papers, codes, and other resources related to unified multimodal models.☆288Updated last week
- A Unified Tokenizer for Visual Generation and Understanding☆396Updated last month
- Pixel-Level Reasoning Model trained with RL☆201Updated last week
- MM-Eureka V0 also called R1-Multimodal-Journey, Latest version is in MM-Eureka☆317Updated 2 months ago
- Explore the Multimodal “Aha Moment” on 2B Model☆607Updated 5 months ago
- Official repository for VisionZip (CVPR 2025)☆347Updated last month
- [ECCV 2024 Oral] Code for paper: An Image is Worth 1/2 Tokens After Layer 2: Plug-and-Play Inference Acceleration for Large Vision-Langua…☆484Updated 8 months ago
- ✈️ [ICCV 2025] Towards Stabilized and Efficient Diffusion Transformers through Long-Skip-Connections with Spectral Constraints☆73Updated 2 months ago