Vchitect / Uni-MMMULinks
☆19Updated this week
Alternatives and similar repositories for Uni-MMMU
Users that are interested in Uni-MMMU are comparing it to the libraries listed below
Sorting:
- [ICML 2025] Streamline Without Sacrifice - Squeeze out Computation Redundancy in LMM☆20Updated 6 months ago
- The official repository of our paper "Reinforcing Video Reasoning with Focused Thinking"☆31Updated 5 months ago
- Ego-R1: Chain-of-Tool-Thought for Ultra-Long Egocentric Video Reasoning☆131Updated 3 months ago
- The official implementation of paper “VChain: Chain-of-Visual-Thought for Reasoning in Video Generation”☆109Updated 2 months ago
- Diffusion Powers Video Tokenizer for Comprehension and Generation (CVPR 2025)☆85Updated 9 months ago
- Official code for MotionBench (CVPR 2025)☆60Updated 9 months ago
- Visual Spatial Tuning☆152Updated last week
- [arXiv: 2502.05178] QLIP: Text-Aligned Visual Tokenization Unifies Auto-Regressive Multimodal Understanding and Generation☆94Updated 9 months ago
- Does Understanding Inform Generation in Unified Multimodal Models? From Analysis to Path Forward☆49Updated 2 weeks ago
- Video-Holmes: Can MLLM Think Like Holmes for Complex Video Reasoning?☆80Updated 4 months ago
- ☆40Updated 5 months ago
- Training-free Guidance in Text-to-Video Generation via Multimodal Planning and Structured Noise Initialization☆24Updated 7 months ago
- The official code of "Thinking With Videos: Multimodal Tool-Augmented Reinforcement Learning for Long Video Reasoning"☆68Updated last month
- [AAAI 26 Demo] Offical repo for CAT-V - Caption Anything in Video: Object-centric Dense Video Captioning with Spatiotemporal Multimodal P…☆60Updated last month
- ☆63Updated last month
- ☆26Updated 8 months ago
- Official implementation of EgoThinker at NIPS 2025☆21Updated 2 weeks ago
- ICML2025☆61Updated 3 months ago
- Official repo for "Streaming Video Understanding and Multi-round Interaction with Memory-enhanced Knowledge" ICLR2025☆89Updated 8 months ago
- [ICCV 2025 Oral] Official implementation of Learning Streaming Video Representation via Multitask Training.☆71Updated 3 weeks ago
- Official implementation of Next Block Prediction: Video Generation via Semi-Autoregressive Modeling☆40Updated 9 months ago
- High-Resolution Visual Reasoning via Multi-Turn Grounding-Based Reinforcement Learning☆52Updated 4 months ago
- Official repository for the UAE paper, unified-GRPO, and unified-Bench☆150Updated 2 months ago
- Official PyTorch Implementation of "Latent Denoising Makes Good Visual Tokenizers"☆160Updated last month
- ☆26Updated 8 months ago
- This repository provides the official implementation of VTBench, a benchmark designed to evaluate the performance of visual tokenizers (V…☆34Updated 4 months ago
- Official implementation of "Open-o3 Video: Grounded Video Reasoning with Explicit Spatio-Temporal Evidence"☆120Updated last month
- [NeurIPS25] Official Implementation (Pytorch) of "DeepVideo-R1"☆30Updated 3 weeks ago
- ☆21Updated last year
- Vinci: A Real-time Embodied Smart Assistant based on Egocentric Vision-Language Model☆79Updated 2 weeks ago