AILab-CVC / M2PT
[CVPR'24] Multimodal Pathway: Improve Transformers with Irrelevant Data from Other Modalities
☆99Updated last year
Alternatives and similar repositories for M2PT:
Users that are interested in M2PT are comparing it to the libraries listed below
- Official repository of paper "Subobject-level Image Tokenization"☆70Updated last month
- Open source implementation of "Vision Transformers Need Registers"☆176Updated last month
- Explore the Limits of Omni-modal Pretraining at Scale☆97Updated 8 months ago
- Implementation of "VL-Mamba: Exploring State Space Models for Multimodal Learning"☆81Updated last year
- This repository is the official implementation of our Autoregressive Pretraining with Mamba in Vision☆77Updated 10 months ago
- [BMVC 2024] PlainMamba: Improving Non-hierarchical Mamba in Visual Recognition☆77Updated last month
- (ICLR 2024, CVPR 2024) SparseFormer☆74Updated 5 months ago
- 「AAAI 2024」 Referred by Multi-Modality: A Unified Temporal Transformers for Video Object Segmentation☆78Updated 10 months ago
- [ICML 2024] This repository includes the official implementation of our paper "Rejuvenating image-GPT as Strong Visual Representation Lea…☆98Updated last year
- [CVPR 2025] Mono-InternVL: Pushing the Boundaries of Monolithic Multimodal Large Language Models with Endogenous Visual Pre-training☆39Updated last month
- [NeurIPS 2024] MoME: Mixture of Multimodal Experts for Generalist Multimodal Large Language Models☆58Updated 4 months ago
- [NeurIPS 2024] Official implementation of the paper "Interfacing Foundation Models' Embeddings"☆124Updated 8 months ago
- [ICLR'25] Reconstructive Visual Instruction Tuning☆81Updated 3 weeks ago
- [ICLR 2024 (Spotlight)] "Frozen Transformers in Language Models are Effective Visual Encoder Layers"☆236Updated last year
- [NeurIPS2024 Spotlight] The official implementation of MambaTree: Tree Topology is All You Need in State Space Model☆92Updated 10 months ago
- An efficient pytorch implementation of selective scan in one file, works with both cpu and gpu, with corresponding mathematical derivatio…☆86Updated last year
- [CVPR 2024] The repository contains the official implementation of "Open-Vocabulary Segmentation with Semantic-Assisted Calibration"☆70Updated 7 months ago
- ☆61Updated last year
- VL-GPT: A Generative Pre-trained Transformer for Vision and Language Understanding and Generation☆86Updated 7 months ago
- This repo contains the code for our paper Towards Open-Ended Visual Recognition with Large Language Model☆95Updated 9 months ago
- Official Implementation of the CrossMAE paper: Rethinking Patch Dependence for Masked Autoencoders☆108Updated 3 weeks ago
- [Fully open] [Encoder-free MLLM] Vision as LoRA☆146Updated 2 weeks ago
- ☆65Updated 2 months ago
- [NeurIPS 24] MoE Jetpack: From Dense Checkpoints to Adaptive Mixture of Experts for Vision Tasks☆116Updated 5 months ago
- ☆109Updated last year
- Official PyTorch implementation of Which Tokens to Use? Investigating Token Reduction in Vision Transformers presented at ICCV 2023 NIVT …☆35Updated last year
- ☆65Updated 9 months ago
- Implementation for SimDINO/SimDINOv2☆125Updated last month
- Adapting LLaMA Decoder to Vision Transformer☆28Updated 11 months ago
- [CVPR 2024] ViT-Lens: Towards Omni-modal Representations☆175Updated 3 months ago