gregor-ge / mBLIPLinks

☆87

Alternatives and similar repositories for mBLIP

Users that are interested in mBLIP are comparing it to the libraries listed below

Sorting:

LAION-AI / General-GPT
☆65Updated 2 years ago
mbzuai-oryx / PALO
(WACV 2025 - Oral) Vision-language conversation in 10 languages including English, Chinese, French, Spanish, Russian, Japanese, Arabic, H…
☆84Updated 4 months ago
prometheus-eval / prometheus-vision
[ACL 2024 Findings & ICLR 2024 WS] An Evaluator VLM that is open-source, offers reproducible evaluation, and inexpensive to use. Specific…
☆78Updated last year
wade3han / champagne
An official codebase for paper " CHAMPAGNE: Learning Real-world Conversation from Large-Scale Web Videos (ICCV 23)"
☆52Updated 2 years ago
RotsteinNoam / FuseCap
FuseCap: Leveraging Large Language Models for Enriched Fused Image Captions
☆56Updated last year
nttmdlab-nlp / SlideVQA
SlideVQA: A Dataset for Document Visual Question Answering on Multiple Images (AAAI2023)
☆103Updated 8 months ago
facebookresearch / diht
Filtering, Distillation, and Hard Negatives for Vision-Language Pre-Training
☆138Updated 2 years ago
huggingface / m4-logs
M4 experiment logbook
☆58Updated 2 years ago
huggingface / docmatix
A huge dataset for Document Visual Question Answering
☆20Updated last year
eric-ai-lab / ComCLIP
Official implementation and dataset for the NAACL 2024 paper "ComCLIP: Training-Free Compositional Image and Text Matching"
☆37Updated last year
NiteshMethani / PlotQA
Dataset introduced in PlotQA: Reasoning over Scientific Plots
☆82Updated 2 years ago
huggingface / OBELICS
Code used for the creation of OBELICS, an open, massive and curated collection of interleaved image-text web documents, containing 141M d…
☆210Updated last year
dhansmair / flamingo-mini
Implementation of the deepmind Flamingo vision-language model, based on Hugging Face language models and ready for training
☆168Updated 2 years ago
jeykigung / HiCLIP
☆30Updated 2 years ago
OFA-Sys / TouchStone
Touchstone: Evaluating Vision-Language Models by Language Models
☆83Updated last year
umd-huang-lab / perceptionCLIP
Code for our ICLR 2024 paper "PerceptionCLIP: Visual Classification by Inferring and Conditioning on Contexts"
☆79Updated last year
allenai / grit_official
Official repository for the General Robust Image Task (GRIT) Benchmark
☆54Updated 2 years ago
kongds / E5-V
E5-V: Universal Embeddings with Multimodal Large Language Models
☆273Updated 11 months ago
apple / ml-veclip
The official repo for the paper "VeCLIP: Improving CLIP Training via Visual-enriched Captions"
☆248Updated 10 months ago
Victorwz / MLM_Filter
Official implementation of our paper "Finetuned Multimodal Language Models are High-Quality Image-Text Data Filters".
☆68Updated 7 months ago
facebookresearch / selective-vqa_ood
Implementation for the CVPR 2023 paper "Improving Selective Visual Question Answering by Learning from Your Peers" (https://arxiv.org/abs…
☆25Updated 2 years ago
kyegomez / PALI3
Implementation of PALI3 from the paper PALI-3 VISION LANGUAGE MODELS: SMALLER, FASTER, STRONGER"
☆146Updated last month
kakaobrain / coyo-align
ALIGN trained on COYO-dataset
☆29Updated last year
sanjayss34 / codevqa
☆83Updated 2 years ago
SHI-Labs / CuMo
CuMo: Scaling Multimodal LLM with Co-Upcycled Mixture-of-Experts
☆159Updated last year
kaiyuyue / nxtp
PyTorch Implementation of Object Recognition as Next Token Prediction [CVPR'24 Highlight]
☆181Updated 7 months ago
kirill-vish / Beyond-INet
Code for experiments for "ConvNet vs Transformer, Supervised vs CLIP: Beyond ImageNet Accuracy"
☆101Updated last year
mu-cai / matryoshka-mm
Matryoshka Multimodal Models
☆119Updated 10 months ago
UCSC-VLAA / Recap-DataComp-1B
[ICML 2025] This is the official repository of our paper "What If We Recaption Billions of Web Images with LLaMA-3 ?"
☆143Updated last year
htqin / GoogleBard-VisUnderstand
How Good is Google Bard's Visual Understanding? An Empirical Study on Open Challenges
☆30Updated 2 years ago