cvlab-kaist / VIRALLinks
Official implementation of "VIRAL: Visual Representation Alignment for MLLMs".
☆146Updated 3 months ago
Alternatives and similar repositories for VIRAL
Users that are interested in VIRAL are comparing it to the libraries listed below
Sorting:
- Official implementation of "URECA : Unique Region Caption Anything"☆56Updated 6 months ago
- Seeing from Another Perspective: Evaluating Multi-View Understanding in MLLMs☆58Updated 2 weeks ago
- ☆65Updated 2 months ago
- ICML2025☆63Updated 4 months ago
- Official implementation for What matters for Representation Alignment: Global Information or Spatial Structure?☆190Updated last month
- [NeurIPS 2025] VideoREPA: Learning Physics for Video Generation through Relational Alignment with Foundation Models☆154Updated last week
- SpatialScore: Towards Unified Evaluation for Multimodal Spatial Understanding☆59Updated 6 months ago
- [ICCV-2025] Multi-Granular Spatio-Temporal Token Merging for Training-Free Acceleration of Video LLMs☆52Updated 5 months ago
- Official implementation of "Open-o3 Video: Grounded Video Reasoning with Explicit Spatio-Temporal Evidence"☆128Updated last month
- Official implementation of "Referring Video Object Segmentation via Language Aligned Track Selection".☆40Updated 7 months ago
- [ICCV 2025] Official code for Perspective-Aware Reasoning in Vision-Language Models via Mental Imagery Simulation☆55Updated 4 months ago
- https://huggingface.co/datasets/multimodal-reasoning-lab/Zebra-CoT☆113Updated 2 months ago
- The official code of "Thinking With Videos: Multimodal Tool-Augmented Reinforcement Learning for Long Video Reasoning"☆77Updated 3 months ago
- [arXiv: 2502.05178] QLIP: Text-Aligned Visual Tokenization Unifies Auto-Regressive Multimodal Understanding and Generation☆94Updated 10 months ago
- Does Understanding Inform Generation in Unified Multimodal Models? From Analysis to Path Forward☆59Updated last month
- Ego-R1: Chain-of-Tool-Thought for Ultra-Long Egocentric Video Reasoning☆137Updated 4 months ago
- ☆15Updated 4 months ago
- [AAAI 26 Demo] Offical repo for CAT-V - Caption Anything in Video: Object-centric Dense Video Captioning with Spatiotemporal Multimodal P…☆63Updated 2 months ago
- The official repository of our paper "Reinforcing Video Reasoning with Focused Thinking"☆34Updated 7 months ago
- Official Implementation of "Towards Open-Vocabulary Semantic Segmentation without Semantic Labels" (NeurIPS 2024)☆53Updated last year
- [ICCV2025]Code Release of Harmonizing Visual Representations for Unified Multimodal Understanding and Generation☆185Updated 7 months ago
- TokLIP: Marry Visual Tokens to CLIP for Multimodal Comprehension and Generation☆233Updated 5 months ago
- [ECCV2024] PartGLEE: A Foundation Model for Recognizing and Parsing Any Objects☆57Updated last year
- Video-Panda: Parameter-efficient Alignment for Encoder-free Video-Language Models [CVPR 2025]☆76Updated 6 months ago
- Official repository for the UAE paper, unified-GRPO, and unified-Bench☆154Updated 4 months ago
- PyTorch implementation of NEPA☆288Updated 3 weeks ago
- Uni-CoT: Towards Unified Chain-of-Thought Reasoning Across Text and Vision☆194Updated 3 weeks ago
- [ICCV'25] Ross3D: Reconstructive Visual Instruction Tuning with 3D-Awareness☆63Updated 5 months ago
- ☆41Updated 6 months ago
- ☆83Updated last month