gregor-ge / mBLIP
☆89Updated last year
Alternatives and similar repositories for mBLIP:
Users that are interested in mBLIP are comparing it to the libraries listed below
- ☆64Updated last year
- Matryoshka Multimodal Models☆90Updated last month
- (WACV 2025) Vision-language conversation in 10 languages including English, Chinese, French, Spanish, Russian, Japanese, Arabic, Hindi, B…☆81Updated 4 months ago
- Filtering, Distillation, and Hard Negatives for Vision-Language Pre-Training☆133Updated last year
- ☆47Updated last year
- CuMo: Scaling Multimodal LLM with Co-Upcycled Mixture-of-Experts☆138Updated 7 months ago
- Code for our ICLR 2024 paper "PerceptionCLIP: Visual Classification by Inferring and Conditioning on Contexts"☆76Updated 8 months ago
- ☆132Updated last year
- Implementation of PALI3 from the paper PALI-3 VISION LANGUAGE MODELS: SMALLER, FASTER, STRONGER"☆143Updated 2 months ago
- [EMNLP 2023] TESTA: Temporal-Spatial Token Aggregation for Long-form Video-Language Understanding☆49Updated last year
- This repo contains evaluation code for the paper "BLINK: Multimodal Large Language Models Can See but Not Perceive". https://arxiv.or…☆112Updated 6 months ago
- E5-V: Universal Embeddings with Multimodal Large Language Models☆215Updated 3 weeks ago
- ☆83Updated last year
- Object Recognition as Next Token Prediction (CVPR 2024 Highlight)☆169Updated 3 weeks ago
- Democratization of "PaLI: A Jointly-Scaled Multilingual Language-Image Model"☆88Updated 9 months ago
- [BMVC22] Official Implementation of ViCHA: "Efficient Vision-Language Pretraining with Visual Concepts and Hierarchical Alignment"☆54Updated 2 years ago
- Code and Models for "GeneCIS A Benchmark for General Conditional Image Similarity"☆55Updated last year
- [ACL 2024 Findings & ICLR 2024 WS] An Evaluator VLM that is open-source, offers reproducible evaluation, and inexpensive to use. Specific…☆62Updated 4 months ago
- MLLM-Bench: Evaluating Multimodal LLMs with Per-sample Criteria☆59Updated 3 months ago
- FuseCap: Large Language Model for Visual Data Fusion in Enriched Caption Generation☆51Updated 9 months ago
- Official repository for the General Robust Image Task (GRIT) Benchmark☆50Updated last year
- Official repo for StableLLAVA☆94Updated last year
- A huge dataset for Document Visual Question Answering☆15Updated 5 months ago
- An official codebase for paper " CHAMPAGNE: Learning Real-world Conversation from Large-Scale Web Videos (ICCV 23)"☆52Updated last year
- Densely Captioned Images (DCI) dataset repository.☆167Updated 6 months ago
- Sparkles: Unlocking Chats Across Multiple Images for Multimodal Instruction-Following Models☆43Updated 7 months ago
- ☆72Updated 8 months ago
- [NeurIPS 2024] MoVA: Adapting Mixture of Vision Experts to Multimodal Context☆140Updated 3 months ago
- a family of highly capabale yet efficient large multimodal models☆176Updated 4 months ago
- ☆61Updated 6 months ago