MCR-PEFT / Ex-MCR
☆41Updated last year
Alternatives and similar repositories for Ex-MCR:
Users that are interested in Ex-MCR are comparing it to the libraries listed below
- ☕️ CREMA: Generalizable and Efficient Video-Language Reasoning via Multimodal Modular Fusion☆30Updated this week
- NeurIPS'2023 official implementation code☆59Updated last year
- How Good is Google Bard's Visual Understanding? An Empirical Study on Open Challenges☆30Updated last year
- LAVIS - A One-stop Library for Language-Vision Intelligence☆47Updated 5 months ago
- Unsolvable Problem Detection: Evaluating Trustworthiness of Vision Language Models☆72Updated 4 months ago
- [NeurIPS 2024] MoME: Mixture of Multimodal Experts for Generalist Multimodal Large Language Models☆42Updated last month
- This repo contains evaluation code for the paper "AV-Odyssey: Can Your Multimodal LLMs Really Understand Audio-Visual Information?"☆19Updated 3 weeks ago
- Multimodal Video Understanding Framework (MVU)☆26Updated 8 months ago
- The official implementation of the paper "MMFuser: Multimodal Multi-Layer Feature Fuser for Fine-Grained Vision-Language Understanding". …☆45Updated 2 months ago
- ☆38Updated last year
- ☆44Updated 8 months ago
- [EMNLP 2024] Official code for "Beyond Embeddings: The Promise of Visual Table in Multi-Modal Models"☆14Updated 3 months ago
- PyTorch code for "ADEM-VL: Adaptive and Embedded Fusion for Efficient Vision-Language Tuning"☆18Updated 2 months ago
- Code for "AVG-LLaVA: A Multimodal Large Model with Adaptive Visual Granularity"☆19Updated 3 months ago
- Official Pytorch Implementation of Paper "A Semantic Space is Worth 256 Language Descriptions: Make Stronger Segmentation Models with Des…☆54Updated 6 months ago
- The official repo for "Ref-AVS: Refer and Segment Objects in Audio-Visual Scenes", ECCV 2024☆31Updated last month
- ☆37Updated 2 months ago
- ☆21Updated 3 months ago
- Implementation of "VL-Mamba: Exploring State Space Models for Multimodal Learning"☆79Updated 9 months ago
- ☆61Updated 6 months ago
- Official repository of paper "Subobject-level Image Tokenization"☆64Updated 8 months ago
- Official implementation for CoVLM: Composing Visual Entities and Relationships in Large Language Models Via Communicative Decoding☆43Updated last year
- Vinci: A Real-time Embodied Smart Assistant based on Egocentric Vision-Language Model☆36Updated this week
- Implementation of the model: "(MC-ViT)" from the paper: "Memory Consolidation Enables Long-Context Video Understanding"☆18Updated 2 months ago
- ☆91Updated 8 months ago
- A big_vision inspired repo that implements a generic Auto-Encoder class capable in representation learning and generative modeling.☆33Updated 6 months ago
- [EMNLP 2023] TESTA: Temporal-Spatial Token Aggregation for Long-form Video-Language Understanding☆49Updated last year
- Data-Efficient Multimodal Fusion on a Single GPU☆51Updated 8 months ago
- Pytorch Implementation of the Model from "MIRASOL3B: A MULTIMODAL AUTOREGRESSIVE MODEL FOR TIME-ALIGNED AND CONTEXTUAL MODALITIES"☆26Updated 2 months ago