jmiemirza / MMFM-ChallengeLinks

Official repository for the MMFM challenge

☆25

Alternatives and similar repositories for MMFM-Challenge

Users that are interested in MMFM-Challenge are comparing it to the libraries listed below

Sorting:

amazon-science / QA-ViT
☆69Updated last year
mu-cai / matryoshka-mm
Matryoshka Multimodal Models
☆116Updated 10 months ago
facebookresearch / DCI
Densely Captioned Images (DCI) dataset repository.
☆194Updated last year
bronyayang / Law_of_Vision_Representation_in_MLLMs
[COLM'25] Official implementation of the Law of Vision Representation in MLLMs
☆170Updated last month
zeyofu / BLINK_Benchmark
This repo contains evaluation code for the paper "BLINK: Multimodal Large Language Models Can See but Not Perceive". https://arxiv.or…
☆149Updated 2 months ago
foundation-multimodal-models / CAPTURE
☆80Updated last year
X2FD / LVIS-INSTRUCT4V
☆133Updated last year
Understanding-Visual-Datasets / VisDiff
Official implementation of "Describing Differences in Image Sets with Natural Language" (CVPR 2024 Oral)
☆128Updated 3 weeks ago
yuecao0119 / MMInstruct
[SCIS 2024] The official implementation of the paper "MMInstruct: A High-Quality Multi-Modal Instruction Tuning Dataset with Extensive Di…
☆59Updated last year
mlfoundations / VisIT-Bench
☆50Updated 2 years ago
TencentARC / GVT
Official code for "What Makes for Good Visual Tokenizers for Large Language Models?".
☆58Updated 2 years ago
Yangyi-Chen / SOLO
[TMLR] Public code repo for paper "A Single Transformer for Scalable Vision-Language Modeling"
☆147Updated last year
SHI-Labs / CuMo
CuMo: Scaling Multimodal LLM with Co-Upcycled Mixture-of-Experts
☆159Updated last year
HJYao00 / DenseConnector
【NeurIPS 2024】Dense Connector for MLLMs
☆180Updated last year
LAION-AI / scaling-laws-openclip
Reproducible scaling laws for contrastive language-image learning (https://arxiv.org/abs/2212.07143)
☆181Updated 5 months ago
vinid / neg_clip
NegCLIP.
☆38Updated 2 years ago
kongds / E5-V
E5-V: Universal Embeddings with Multimodal Large Language Models
☆274Updated 11 months ago
yfzhang114 / LLaVA-Align
[ACM Multimedia 2025] This is the official repo for Debiasing Large Visual Language Models, including a Post-Hoc debias method and Visual…
☆82Updated 9 months ago
WisconsinAIVision / ViP-LLaVA
[CVPR2024] ViP-LLaVA: Making Large Multimodal Models Understand Arbitrary Visual Prompts
☆331Updated last year
icoz69 / StableLLAVA
Official repo for StableLLAVA
☆95Updated last year
facebookresearch / diht
Filtering, Distillation, and Hard Negatives for Vision-Language Pre-Training
☆138Updated 2 years ago
FreedomIntelligence / MLLM-Bench
MLLM-Bench: Evaluating Multimodal LLMs with Per-sample Criteria
☆72Updated last year
patrick-tssn / VideoHallucer
VideoHallucer, The first comprehensive benchmark for hallucination detection in large video-language models (LVLMs)
☆38Updated last month
Victorwz / MLM_Filter
Official implementation of our paper "Finetuned Multimodal Language Models are High-Quality Image-Text Data Filters".
☆68Updated 7 months ago
SivanDoveh / DAC
Repository for the paper: dense and aligned captions (dac) promote compositional reasoning in vl models
☆27Updated 2 years ago
yuhui-zh15 / VLMClassifier
Official implementation of "Why are Visually-Grounded Language Models Bad at Image Classification?" (NeurIPS 2024)
☆92Updated last year
teaching-clip-to-count / teaching-clip-to-count.github.io
☆16Updated 2 years ago
TIGER-AI-Lab / Mantis
Official code for Paper "Mantis: Multi-Image Instruction Tuning" [TMLR 2024]
☆231Updated 8 months ago
TempleX98 / MoVA
[NeurIPS 2024] MoVA: Adapting Mixture of Vision Experts to Multimodal Context
☆168Updated last year
OpenGVLab / MMT-Bench
[ICML 2024] | MMT-Bench: A Comprehensive Multimodal Benchmark for Evaluating Large Vision-Language Models Towards Multitask AGI
☆115Updated last year