cvlab-kaist / VIRALLinks
Official implementation of "VIRAL: Visual Representation Alignment for MLLMs".
☆141Updated 3 months ago
Alternatives and similar repositories for VIRAL
Users that are interested in VIRAL are comparing it to the libraries listed below
Sorting:
- Official implementation of "URECA : Unique Region Caption Anything"☆56Updated 5 months ago
- Seeing from Another Perspective: Evaluating Multi-View Understanding in MLLMs☆58Updated last month
- Official Implementation of "Towards Open-Vocabulary Semantic Segmentation without Semantic Labels" (NeurIPS 2024)☆53Updated last year
- Official implementation of "Referring Video Object Segmentation via Language Aligned Track Selection".☆40Updated 6 months ago
- ☆65Updated last month
- [ICCV-2025] Multi-Granular Spatio-Temporal Token Merging for Training-Free Acceleration of Video LLMs☆49Updated 5 months ago
- [ICCV 2025] Official code for Perspective-Aware Reasoning in Vision-Language Models via Mental Imagery Simulation☆49Updated 3 months ago
- Does Understanding Inform Generation in Unified Multimodal Models? From Analysis to Path Forward☆55Updated last month
- Official implementation for What matters for Representation Alignment: Global Information or Spatial Structure?☆161Updated 2 weeks ago
- ICML2025☆62Updated 4 months ago
- Code for "How far can we go with ImageNet for Text-to-Image generation?" paper☆94Updated last month
- ☆14Updated 3 months ago
- [arXiv: 2502.05178] QLIP: Text-Aligned Visual Tokenization Unifies Auto-Regressive Multimodal Understanding and Generation☆94Updated 9 months ago
- The official code of "Thinking With Videos: Multimodal Tool-Augmented Reinforcement Learning for Long Video Reasoning"☆73Updated 2 months ago
- SpatialScore: Towards Unified Evaluation for Multimodal Spatial Understanding☆59Updated 5 months ago
- (ICCV 2025) ReferDINO: Referring Video Object Segmentation with Visual Grounding Foundations☆125Updated last month
- [ICCV 2025 Oral] Official implementation of Learning Streaming Video Representation via Multitask Training.☆73Updated this week
- [NeurIPS 2025] VideoREPA: Learning Physics for Video Generation through Relational Alignment with Foundation Models☆141Updated last month
- PyTorch implementation of NEPA☆196Updated this week
- [AAAI 26 Demo] Offical repo for CAT-V - Caption Anything in Video: Object-centric Dense Video Captioning with Spatiotemporal Multimodal P…☆63Updated 2 months ago
- Ego-R1: Chain-of-Tool-Thought for Ultra-Long Egocentric Video Reasoning☆133Updated 4 months ago
- SpaceR: The first MLLM empowered by SG-RLVR for video spatial reasoning☆100Updated 5 months ago
- [NeurlPS 2024] One Token to Seg Them All: Language Instructed Reasoning Segmentation in Videos☆143Updated last year
- [ICCV2025]Code Release of Harmonizing Visual Representations for Unified Multimodal Understanding and Generation☆184Updated 7 months ago
- Visual Spatial Tuning☆157Updated 3 weeks ago
- [CVPR 2025] GPS as a Control Signal for Image Generation☆24Updated 9 months ago
- [NeurIPS'25] Official implementation of "Emergent Temporal Correspondences from Video Diffusion Models"☆90Updated 3 weeks ago
- Video-Panda: Parameter-efficient Alignment for Encoder-free Video-Language Models [CVPR 2025]☆76Updated 6 months ago
- [CVPR'25] 🌟🌟 EgoTextVQA: Towards Egocentric Scene-Text Aware Video Question Answering☆43Updated 6 months ago
- Official implementation of "Open-o3 Video: Grounded Video Reasoning with Explicit Spatio-Temporal Evidence"☆127Updated last week