MCR-PEFT / Ex-MCR
☆39Updated last year
Related projects ⓘ
Alternatives and complementary repositories for Ex-MCR
- Multimodal Video Understanding Framework (MVU)☆24Updated 6 months ago
- NeurIPS'2023 official implementation code☆59Updated last year
- Unsolvable Problem Detection: Evaluating Trustworthiness of Vision Language Models☆70Updated 2 months ago
- Language Repository for Long Video Understanding☆29Updated 5 months ago
- PyTorch code for "ADEM-VL: Adaptive and Embedded Fusion for Efficient Vision-Language Tuning"☆17Updated 3 weeks ago
- Data-Efficient Multimodal Fusion on a Single GPU☆47Updated 6 months ago
- LAVIS - A One-stop Library for Language-Vision Intelligence☆47Updated 3 months ago
- ☕️ CREMA: Generalizable and Efficient Video-Language Reasoning via Multimodal Modular Fusion☆27Updated 5 months ago
- How Good is Google Bard's Visual Understanding? An Empirical Study on Open Challenges☆30Updated last year
- Video-STaR: Self-Training Enables Video Instruction Tuning with Any Supervision☆47Updated 4 months ago
- [NeurIPS2023] Official implementation of the paper "Large Language Models are Visual Reasoning Coordinators"☆103Updated last year
- Official implement of MIA-DPO☆41Updated 3 weeks ago
- Pytorch Implementation of the paper: "Learning to (Learn at Test Time): RNNs with Expressive Hidden States"☆23Updated this week
- [EMNLP 2023] TESTA: Temporal-Spatial Token Aggregation for Long-form Video-Language Understanding☆47Updated 10 months ago
- Official repository of paper "Subobject-level Image Tokenization"☆62Updated 7 months ago
- ☆62Updated 4 months ago
- UnifiedMLLM: Enabling Unified Representation for Multi-modal Multi-tasks With Large Language Model☆17Updated 3 months ago
- ChatBridge, an approach to learning a unified multimodal model to interpret, correlate, and reason about various modalities without rely…☆47Updated last year
- Pytorch Implementation of the Model from "MIRASOL3B: A MULTIMODAL AUTOREGRESSIVE MODEL FOR TIME-ALIGNED AND CONTEXTUAL MODALITIES"☆24Updated 2 weeks ago
- [NeurIPS 2024] Stabilize the Latent Space for Image Autoregressive Modeling: A Unified Perspective☆41Updated 3 weeks ago
- Implementation of "VL-Mamba: Exploring State Space Models for Multimodal Learning"☆78Updated 8 months ago
- [CVPR'24] Multimodal Pathway: Improve Transformers with Irrelevant Data from Other Modalities☆94Updated 8 months ago
- [NeurIPS 2024] Efficient Multi-modal Models via Stage-wise Visual Context Compression☆42Updated 3 months ago
- The official repo for "Ref-AVS: Refer and Segment Objects in Audio-Visual Scenes", ECCV 2024☆26Updated last month
- Official PyTorch code of "Grounded Question-Answering in Long Egocentric Videos", accepted by CVPR 2024.☆52Updated 2 months ago
- ☆72Updated 6 months ago
- Implementation of the model: "(MC-ViT)" from the paper: "Memory Consolidation Enables Long-Context Video Understanding"☆16Updated 2 weeks ago
- [NeurIPS-24] This is the official implementation of the paper "DeepStack: Deeply Stacking Visual Tokens is Surprisingly Simple and Effect…☆32Updated 5 months ago
- ☆38Updated last year
- [CVPR 2024] "Towards Robust Audiovisual Segmentation in Complex Environments with Quantization-based Semantic Decomposition"☆11Updated 8 months ago