cvlab-kaist / VIRALLinks
Official implementation of "VIRAL: Visual Representation Alignment for MLLMs".
☆132Updated last month
Alternatives and similar repositories for VIRAL
Users that are interested in VIRAL are comparing it to the libraries listed below
Sorting:
- Official implementation of "URECA : Unique Region Caption Anything"☆53Updated 3 months ago
- Seeing from Another Perspective: Evaluating Multi-View Understanding in MLLMs☆50Updated 3 months ago
- Official Implementation of "Towards Open-Vocabulary Semantic Segmentation without Semantic Labels" (NeurIPS 2024)☆52Updated last year
- Official implementation of "Referring Video Object Segmentation via Language Aligned Track Selection".☆40Updated 4 months ago
- ☆53Updated last month
- ☆13Updated last month
- [ICCV 2025] Official code for Perspective-Aware Reasoning in Vision-Language Models via Mental Imagery Simulation☆45Updated last month
- Official implementation of "Emergent Temporal Correspondences from Video Diffusion Models"☆82Updated 4 months ago
- Official respository for ReasonGen-R1☆70Updated 4 months ago
- SpatialScore: Towards Unified Evaluation for Multimodal Spatial Understanding☆56Updated 3 months ago
- Official implementation of "Open-o3 Video: Grounded Video Reasoning with Explicit Spatio-Temporal Evidence"☆48Updated this week
- [arXiv: 2502.05178] QLIP: Text-Aligned Visual Tokenization Unifies Auto-Regressive Multimodal Understanding and Generation☆91Updated 7 months ago
- [CVPR 2025] Fine-Grained Image-Text Correspondence with Cost Aggregation for Open-Vocabulary Part Segmentation☆18Updated 2 months ago
- [ICCV'25] Official implementation of "Reangle-A-Video: 4D Video Generation as Video-to-Video Translation"☆73Updated 3 months ago
- [NeurIPS 2025] VideoREPA: Learning Physics for Video Generation through Relational Alignment with Foundation Models☆82Updated 2 weeks ago
- Code for "How far can we go with ImageNet for Text-to-Image generation?" paper☆93Updated 2 months ago
- [ICCV2025]Code Release of Harmonizing Visual Representations for Unified Multimodal Understanding and Generation☆177Updated 5 months ago
- [ICCV-2025] Multi-Granular Spatio-Temporal Token Merging for Training-Free Acceleration of Video LLMs☆45Updated 3 months ago
- [CVPR2025 Highlight] PAR: Parallelized Autoregressive Visual Generation. https://yuqingwang1029.github.io/PAR-project☆178Updated 7 months ago
- ☆30Updated 10 months ago
- ☆40Updated 3 months ago
- [NeurlPS 2024] One Token to Seg Them All: Language Instructed Reasoning Segmentation in Videos☆138Updated 10 months ago
- ☆109Updated 2 months ago
- [CVPR 2025] GPS as a Control Signal for Image Generation☆24Updated 7 months ago
- Official Implementation of Paper Transfer between Modalities with MetaQueries