ShareLab-SII/CoMP-MM

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/ShareLab-SII/CoMP-MM)

ShareLab-SII / CoMP-MM

Official repository of "CoMP: Continual Multimodal Pre-training for Vision Foundation Models"

☆48

Alternatives and similar repositories for CoMP-MM

Users that are interested in CoMP-MM are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

Row11n / Prova
View on GitHub
[AAAI-25] Official repository of "Comprehensive Multi-Modal Prototypes are Simple and Effective Classifiers for Vast-Vocabulary Object De…
☆20Dec 27, 2024Updated last year
MengLcool / SliMM
View on GitHub
☆25Dec 26, 2024Updated last year
wdrink / OpenTokenizer
View on GitHub
☆21Jan 17, 2025Updated last year
inst-it / inst-it
View on GitHub
[NeurIPS 2025] The official repository of "Inst-IT: Boosting Multimodal Instance Understanding via Explicit Visual Prompt Instruction Tun…
☆40Feb 20, 2025Updated last year
ShareLab-SII / UniAR
View on GitHub
[ICML 2026] The official implementation of paper "Unified Multimodal Autoregressive Modeling with Shared Context—Visual Tokenizer is Key …
☆46Jul 13, 2026Updated last week
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
DAMO-NLP-SG / CMM
View on GitHub
✨✨The Curse of Multi-Modalities (CMM): Evaluating Hallucinations of Large Multimodal Models across Language, Visual, and Audio
☆54Jul 11, 2025Updated last year
MileBench / MileBench
View on GitHub
This repo contains evaluation code for the paper "MileBench: Benchmarking MLLMs in Long Context"
☆38Jul 11, 2024Updated 2 years ago
CodeGoat24 / LiFT
View on GitHub
Official implementation of LiFT: Leveraging Human Feedback for Text-to-Video Model Alignment.
☆85May 4, 2025Updated last year
xiaoxing2001 / DeGLA
View on GitHub
[ACM MM25] Official Pytorch implementation of [Decoupled Global-Local Alignment for Improving Compositional Understanding]
☆16Jul 15, 2025Updated last year
chs20 / fuselip
View on GitHub
FuseLIP: Multimodal Embeddings via Early Fusion of Discrete Tokens
☆17Sep 8, 2025Updated 10 months ago
OpenCausaLab / ADAM
View on GitHub
We introduce ADAM, An emboDied causal Agent in Minecraft, that can autonomously navigate the open world, perceive multimodal contexts, le…
☆33Apr 7, 2025Updated last year
jinzhuoran / RAG-RewardBench
View on GitHub
RAG-RewardBench: Benchmarking Reward Models in Retrieval Augmented Generation for Preference Alignment
☆18Dec 19, 2024Updated last year
CodeGoat24 / UniGenBench
View on GitHub
UniGenBench++: A Unified Semantic Evaluation Benchmark for Text-to-Image Generation
☆139Jun 19, 2026Updated last month
MIV-XJTU / FLAME
View on GitHub
[CVPR 2025] PyTorch implementation of paper "FLAME: Frozen Large Language Models Enable Data-Efficient Language-Image Pre-training"
☆33Jul 8, 2025Updated last year
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
Multimodal-Representation-Learning-MRL / GA-DMS
View on GitHub
[EMNLP25 Main]The official code of "Gradient-Attention Guided Dual-Masking Synergetic Framework for Robust Text-based Person Retrieval"
☆25Mar 30, 2026Updated 3 months ago
Osilly / dynamic_llava
View on GitHub
[ICLR 2025] The official pytorch implement of "Dynamic-LLaVA: Efficient Multimodal Large Language Models via Dynamic Vision-language Cont…
☆72Sep 18, 2025Updated 10 months ago
CodeGoat24 / Face-diffuser
View on GitHub
[CVPR2024] Official implementation of High-fidelity Person-centric Subject-to-Image Synthesis.
☆53Feb 26, 2025Updated last year
CodeGoat24 / MagicFace
View on GitHub
Official implementation of MagicFace: Training-free Universal-Style Human Image Customized Synthesis.
☆66Dec 24, 2024Updated last year
air-embodied-brain / Em-Garde
View on GitHub
Implementation of Em_Garde: a proposal-retrieval framework for streaming video understanding
☆26Jun 24, 2026Updated 3 weeks ago
anxiangsir / Video_Benchmark_Suite
View on GitHub
Video Benchmark Suite: Rapid Evaluation of Video Foundation Models
☆17Jan 10, 2025Updated last year
multimodal-art-projection / TreePO
View on GitHub
☆65Mar 30, 2026Updated 3 months ago
OpenSparseLLMs / CLIP-MoE
View on GitHub
CLIP-MoE: Mixture of Experts for CLIP
☆58Oct 10, 2024Updated last year
Cooperx521 / ScaleCap
View on GitHub
(ICLR 2026)Official repository of 'ScaleCap: Inference-Time Scalable Image Captioning via Dual-Modality Debiasing’
☆60Jan 26, 2026Updated 5 months ago
Managed Database hosting by DigitalOcean • Ad
PostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
wjpoom / SPEC
View on GitHub
[CVPR 2024] The official implementation of paper "synthesize, diagnose, and optimize: towards fine-grained vision-language understanding"
☆52Jun 16, 2025Updated last year
ShareLab-SII / CaTok
View on GitHub
[CVPR-26] Official repository of "CaTok: Taming Mean Flows for One-Dimensional Causal Image Tokenization"
☆19Mar 9, 2026Updated 4 months ago
Euphoria16 / DocMark
View on GitHub
[CVPR 2025] Adaptive Markup Language Generation for Contextually-Grounded Visual Document Understanding
☆16Jun 16, 2025Updated last year
leroy9472 / InMind
View on GitHub
☆15Nov 18, 2025Updated 8 months ago
CodeGoat24 / DreamText
View on GitHub
[CVPR2025] Official implementation of High Fidelity Scene Text Synthesis.
☆82Mar 24, 2025Updated last year
yaolinli / TimeChat-Online
View on GitHub
[ACM MM 2025] TimeChat-online: 80% Visual Tokens are Naturally Redundant in Streaming Videos
☆132Jun 29, 2026Updated 3 weeks ago
waltonfuture / MM-UPT
View on GitHub
[NeurIPS 2025] First SFT, Second RL, Third UPT: Continual Improving Multi-Modal LLM Reasoning via Unsupervised Post-Training
☆88Oct 29, 2025Updated 8 months ago
MengLcool / DeepStack-VL
View on GitHub
[NeurIPS-24] This is the official implementation of the paper "DeepStack: Deeply Stacking Visual Tokens is Surprisingly Simple and Effect…
☆93Jun 17, 2024Updated 2 years ago
chenllliang / MMEvalPro
View on GitHub
[NAACL 2025] Source code for MMEvalPro, a more trustworthy and efficient benchmark for evaluating LMMs
☆25Sep 26, 2024Updated last year
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
CodeGoat24 / Pref-GRPO
View on GitHub
Official implementation of Pref-GRPO: Pairwise Preference Reward-based GRPO for Stable Text-to-Image Reinforcement Learning
☆274Feb 10, 2026Updated 5 months ago
Topdu / DocPTBench
View on GitHub
Benchmarking End-to-End Photographed Document Parsing and Translation
☆17Dec 4, 2025Updated 7 months ago
locuslab / T-MARS
View on GitHub
Code for T-MARS data filtering
☆35Aug 23, 2023Updated 2 years ago
raghavlite / B3
View on GitHub
☆43Jan 12, 2026Updated 6 months ago
Yangyi-Chen / SOLO
View on GitHub
[TMLR] Public code repo for paper "A Single Transformer for Scalable Vision-Language Modeling"
☆150Nov 14, 2024Updated last year
deepshwang / crepa
View on GitHub
☆15Jun 21, 2025Updated last year
RifleZhang / LLaVA-Hound-DPO
View on GitHub
☆158Oct 31, 2024Updated last year