AILab-CVC / M2PT
[CVPR'24] Multimodal Pathway: Improve Transformers with Irrelevant Data from Other Modalities
☆99Updated last year
Alternatives and similar repositories for M2PT:
Users that are interested in M2PT are comparing it to the libraries listed below
- [CVPR 2024] ViT-Lens: Towards Omni-modal Representations☆172Updated last month
- Implementation of "VL-Mamba: Exploring State Space Models for Multimodal Learning"☆81Updated last year
- Open source implementation of "Vision Transformers Need Registers"☆168Updated last month
- Official Implementation of the CrossMAE paper: Rethinking Patch Dependence for Masked Autoencoders☆102Updated 3 months ago
- Official repository of paper "Subobject-level Image Tokenization"☆65Updated 10 months ago
- Official implementation of SCLIP: Rethinking Self-Attention for Dense Vision-Language Inference☆151Updated 5 months ago
- [CVPR 2024] The repository contains the official implementation of "Open-Vocabulary Segmentation with Semantic-Assisted Calibration"☆69Updated 5 months ago
- [ICML 2024] This repository includes the official implementation of our paper "Rejuvenating image-GPT as Strong Visual Representation Lea…☆97Updated 10 months ago
- Explore the Limits of Omni-modal Pretraining at Scale☆96Updated 6 months ago
- 「AAAI 2024」 Referred by Multi-Modality: A Unified Temporal Transformers for Video Object Segmentation☆77Updated 8 months ago
- [ICLR 2024 (Spotlight)] "Frozen Transformers in Language Models are Effective Visual Encoder Layers"☆232Updated last year
- [CVPR 2025] VoCo-LLaMA: This repo is the official implementation of "VoCo-LLaMA: Towards Vision Compression with Large Language Models".☆134Updated 2 weeks ago
- [NeurIPS 2024] Official implementation of the paper "Interfacing Foundation Models' Embeddings"☆122Updated 7 months ago
- [NeurIPS 2024] MoME: Mixture of Multimodal Experts for Generalist Multimodal Large Language Models☆52Updated 3 months ago
- [CVPR 2024] Official implementation of "Universal Segmentation at Arbitrary Granularity with Language Instruction"☆85Updated last year
- ☆111Updated 7 months ago
- [CVPR2025] Code Release of F-LMM: Grounding Frozen Large Multimodal Models☆73Updated 7 months ago
- [CVPR 2024] Official implementation of "ViTamin: Designing Scalable Vision Models in the Vision-language Era"☆198Updated 9 months ago
- ☆128Updated 9 months ago
- ☆96Updated 10 months ago
- [ECCV24] VISA: Reasoning Video Object Segmentation via Large Language Model☆164Updated 7 months ago
- Official Implementation for CVPR 2024 paper: CLIP as RNN: Segment Countless Visual Concepts without Training Endeavor☆103Updated 8 months ago
- [BMVC 2024] PlainMamba: Improving Non-hierarchical Mamba in Visual Recognition☆74Updated 7 months ago
- [NeurIPS 2024] Efficient Large Multi-modal Models via Visual Context Compression☆52Updated last month
- [NeurIPS 2024] Vision Model Pre-training on Interleaved Image-Text Data via Latent Compression Learning☆68Updated last month
- VL-GPT: A Generative Pre-trained Transformer for Vision and Language Understanding and Generation☆85Updated 6 months ago
- (CVPR 2025) PyramidDrop: Accelerating Your Large Vision-Language Models via Pyramid Visual Redundancy Reduction☆75Updated 2 weeks ago
- Official implementation of the Law of Vision Representation in MLLMs☆151Updated 4 months ago