HKUST-LongGroup/CoMM

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/HKUST-LongGroup/CoMM)

HKUST-LongGroup / CoMM

[CVPR 2025 Highlight] Official repository for CoMM Dataset

☆56

Alternatives and similar repositories for CoMM

Users that are interested in CoMM are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

Dongping-Chen / ISG
View on GitHub
(ICLR 2025 Spotlight) Official code repository for Interleaved Scene Graph.
☆31Aug 7, 2025Updated 11 months ago
princetonvisualai / icons
View on GitHub
☆22Apr 24, 2025Updated last year
ant-research / DreamLIP
View on GitHub
[ECCV 2024] Official PyTorch implementation of DreamLIP: Language-Image Pre-training with Long Captions
☆138May 8, 2025Updated last year
junha1125 / Domain-Adaptation-Generalization-in-ECCV-2024
View on GitHub
☆16Sep 29, 2024Updated last year
Kwai-YuanQi / TaskGalaxy
View on GitHub
Scaling Multi-modal Instruction Fine-tuning with Tens of Thousands Vision Task Types
☆32Jul 16, 2025Updated last year
Virtual machines for every use case on DigitalOcean • Ad
Get dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
Qinying-Liu / TagAlign
View on GitHub
Official implementation of TagAlign
☆37Dec 11, 2024Updated last year
OpenGVLab / OmniCorpus
View on GitHub
[ICLR 2025 Spotlight] OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text
☆425May 5, 2025Updated last year
ExplainableML / fomo_in_flux
View on GitHub
Code and benchmark for the paper: "A Practitioner's Guide to Continual Multimodal Pretraining" [NeurIPS'24]
☆62Dec 10, 2024Updated last year
leroy9472 / MeepleLM
View on GitHub
☆27Jan 20, 2026Updated 6 months ago
jiyounglee-0523 / VisAlign
View on GitHub
☆20Apr 23, 2024Updated 2 years ago
AlonMendelson / SGVL
View on GitHub
☆17Dec 13, 2023Updated 2 years ago
haoyu-bu / CAFe
View on GitHub
Code for "CAFe: Unifying Representation and Generation with Contrastive-Autoregressive Finetuning"
☆33Mar 26, 2025Updated last year
wusize / OpenUni
View on GitHub
☆189Jun 27, 2025Updated last year
GAIR-NLP / thinking-with-generated-images
View on GitHub
Doodling our way to AGI ✏️ 🖼️ 🧠
☆128May 29, 2025Updated last year
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
yunfeixie233 / ViGaL
View on GitHub
☆70Feb 4, 2026Updated 5 months ago
ys-zong / VL-ICL
View on GitHub
[ICLR 2025] VL-ICL Bench: The Devil in the Details of Multimodal In-Context Learning
☆69Sep 20, 2025Updated 10 months ago
VincentDENGP / 3D-LR
View on GitHub
Can 3D Vision-Language Models Truly Understand Natural Language?
☆20Mar 28, 2024Updated 2 years ago
TencentARC / TVTS
View on GitHub
Turning to Video for Transcript Sorting
☆49Aug 27, 2023Updated 2 years ago
zhouyiks / CoLVA
View on GitHub
☆44Jul 9, 2025Updated last year
deepglint / Victor
View on GitHub
ViCToR: Improving Visual Comprehension via Token Reconstruction for Pretraining LMMs
☆29Aug 15, 2025Updated 11 months ago
xichenpan / Kosmos-G
View on GitHub
Official Implementation of ICLR'24: Kosmos-G: Generating Images in Context with Multimodal Large Language Models
☆75May 25, 2024Updated 2 years ago
StanfordMIMI / villa
View on GitHub
[ICCV 2023] ViLLA: Fine-grained vision-language representation learning from real-world data
☆45Oct 15, 2023Updated 2 years ago
UT-SysML / rumors-in-multi-agent
View on GitHub
Code for AAAI Workshop WMAC "Paper Simulating Rumor Spreading in Social Networks using LLM agents"
☆13Feb 20, 2025Updated last year
Simple, predictable pricing with DigitalOcean hosting • Ad
Always know what you'll pay with monthly caps and flat pricing. Enterprise-grade infrastructure trusted by 600k+ customers.
WangFei-2019 / SNARE
View on GitHub
Project for SNARE benchmark
☆11Jun 5, 2024Updated 2 years ago
OpenGVLab / MM-Interleaved
View on GitHub
MM-Interleaved: Interleaved Image-Text Generative Modeling via Multi-modal Feature Synchronizer
☆255Apr 3, 2024Updated 2 years ago
facebookresearch / DCI
View on GitHub
Densely Captioned Images (DCI) dataset repository.
☆197Jul 1, 2024Updated 2 years ago
omipan / svl_adapter
View on GitHub
SVL-Adapter: Self-Supervised Adapter for Vision-Language Pretrained Models
☆21Jan 11, 2024Updated 2 years ago
sunxm2357 / DIME-FM
View on GitHub
Implementation of "DIME-FM: DIstilling Multimodal and Efficient Foundation Models"
☆15Oct 12, 2023Updated 2 years ago
uvavision / SyViC
View on GitHub
[ICCV 2023] Going Beyond Nouns With Vision & Language Models Using Synthetic Data
☆13Sep 30, 2023Updated 2 years ago
linzhiqiu / visual_gpt_score
View on GitHub
VisualGPTScore for visio-linguistic reasoning
☆27Oct 7, 2023Updated 2 years ago
Tree-Shu-Zhao / RebQ.pytorch
View on GitHub
This is the official code for the paper "Reconstruct before Query: Continual Missing Modality Learning with Decomposed Prompt Collaborati…
☆12Aug 13, 2024Updated last year
djghosh13 / geneval
View on GitHub
GenEval: An object-focused framework for evaluating text-to-image alignment
☆472Mar 3, 2025Updated last year
Simple, predictable pricing with DigitalOcean hosting • Ad
Always know what you'll pay with monthly caps and flat pricing. Enterprise-grade infrastructure trusted by 600k+ customers.
mlfoundations / clip_quality_not_quantity
View on GitHub
☆28Oct 18, 2022Updated 3 years ago
Phantom-video / Phantom-Data
View on GitHub
Phantom-Data: Towards a General Subject-Consistent Video Generation Dataset
☆116Feb 25, 2026Updated 5 months ago
GAIR-NLP / anole
View on GitHub
[Extended verision ICLR 2025 Blog Track] Anole: An Open, Autoregressive and Native Multimodal Models for Interleaved Image-Text Generatio…
☆842Jun 16, 2025Updated last year
baaivision / EVE
View on GitHub
EVE Series: Encoder-Free Vision-Language Models from BAAI
☆374Jul 24, 2025Updated last year
baaivision / CapsFusion
View on GitHub
[CVPR 2024] CapsFusion: Rethinking Image-Text Data at Scale
☆215Feb 27, 2024Updated 2 years ago
Zi-hao-Wei / Efficient-Vision-Language-Pre-training-by-Cluster-Masking
View on GitHub
[CVPR 2024] Improving language-visual pretraining efficiency by perform cluster-based masking on images.
☆33May 16, 2024Updated 2 years ago
bardisafa / PreSel
View on GitHub
[CVPR 2025] An Implementation of the paper "Pre-Instruction Data Selection for Visual Instruction Tuning"
☆17Jun 9, 2025Updated last year