IVY-LVLM / Video-MA2MBALinks
Official Implementation of Video-MA2MBA
☆12Updated 9 months ago
Alternatives and similar repositories for Video-MA2MBA
Users that are interested in Video-MA2MBA are comparing it to the libraries listed below
Sorting:
- [EMNLP 2024] Preserving Multi-Modal Capabilities of Pre-trained VLMs for Improving Vision-Linguistic Compositionality☆18Updated 11 months ago
- UnifiedMLLM: Enabling Unified Representation for Multi-modal Multi-tasks With Large Language Model☆22Updated last year
- [CVPR 2024] "Towards Robust Audiovisual Segmentation in Complex Environments with Quantization-based Semantic Decomposition"☆13Updated last year
- [ECCV'24 Oral] PiTe: Pixel-Temporal Alignment for Large Video-Language Model☆17Updated 7 months ago
- ☆12Updated 8 months ago
- [CVPR2025] Breaking the Low-Rank Dilemma of Linear Attention☆28Updated 6 months ago
- Official Implementation of GENIUS: A Generative Framework for Universal Multimodal Search, CVPR 2025☆28Updated last month
- WorldSense: Evaluating Real-world Omnimodal Understanding for Multimodal LLMs☆29Updated last week
- Data-Efficient Multimodal Fusion on a Single GPU☆68Updated last year
- [CVPR 2025] DiscoVLA: Discrepancy Reduction in Vision, Language, and Alignment for Parameter-Efficient Video-Text Retrieval☆19Updated 2 months ago
- Official repository of "Interactive Text-to-Image Retrieval with Large Language Models: A Plug-and-Play Approach" (ACL 2024 Oral)☆32Updated 5 months ago
- [ICLR 2025] CREMA: Generalizable and Efficient Video-Language Reasoning via Multimodal Modular Fusion☆50Updated 2 months ago
- Official code for WACV 2024 paper, "Annotation-free Audio-Visual Segmentation"☆33Updated 11 months ago
- The official repository of our paper "Reinforcing Video Reasoning with Focused Thinking"☆24Updated 3 months ago
- [NeurIPS 2024] Official PyTorch implementation of "Improving Compositional Reasoning of CLIP via Synthetic Vision-Language Negatives"☆42Updated 9 months ago
- Official implementation for CIGN☆16Updated 2 years ago
- [ICCV'25] HERMES: temporal-coHERent long-forM understanding with Episodes and Semantics☆35Updated last week
- NeurIPS'2023 official implementation code☆65Updated last year
- The official repo for "Stepping Stones: A Progressive Training Strategy for Audio-Visual Semantic Segmentation", ECCV 2024☆17Updated 11 months ago
- ☆32Updated 5 months ago
- ☆24Updated 5 months ago
- ☆23Updated 3 months ago
- [CVPR 2024] Improving language-visual pretraining efficiency by perform cluster-based masking on images.☆29Updated last year
- Agentic Keyframe Search for Video Question Answering☆10Updated 5 months ago
- ☆12Updated 7 months ago
- [ICLR2025] This repository is the official implementation of our Autoregressive Pretraining with Mamba in Vision☆84Updated 3 months ago
- [NeurIPS 2024] Stabilize the Latent Space for Image Autoregressive Modeling: A Unified Perspective☆71Updated 10 months ago
- PyTorch code for "ADEM-VL: Adaptive and Embedded Fusion for Efficient Vision-Language Tuning"☆20Updated 10 months ago
- ☆44Updated 4 months ago
- [CVPR 2024] Multimodal Pathway: Improve Transformers with Irrelevant Data from Other Modalities