facebookresearch / unibench
Python Library to evaluate VLM models' robustness across diverse benchmarks
☆168Updated last week
Related projects ⓘ
Alternatives and complementary repositories for unibench
- E5-V: Universal Embeddings with Multimodal Large Language Models☆167Updated 3 months ago
- VLM Evaluation: Benchmark for VLMs, spanning text generation tasks from VQA to Captioning☆85Updated last month
- Official implementation of the Law of Vision Representation in MLLMs☆128Updated 2 months ago
- [COLM-2024] List Items One by One: A New Data Source and Learning Paradigm for Multimodal LLMs☆123Updated 2 months ago
- Matryoshka Multimodal Models☆81Updated last month
- This repo contains evaluation code for the paper "BLINK: Multimodal Large Language Models Can See but Not Perceive". https://arxiv.or…☆107Updated 4 months ago
- LongLLaVA: Scaling Multi-modal LLMs to 1000 Images Efficiently via Hybrid Architecture☆175Updated 3 weeks ago
- When do we not need larger vision models?☆333Updated 2 months ago
- This repo contains the code and data for "VLM2Vec: Training Vision-Language Models for Massive Multimodal Embedding Tasks"☆59Updated this week
- ☆145Updated 3 weeks ago
- ☆64Updated 4 months ago
- Official code for Paper "Mantis: Multi-Image Instruction Tuning"☆179Updated last week
- Code used for the creation of OBELICS, an open, massive and curated collection of interleaved image-text web documents, containing 141M d…☆186Updated 2 months ago
- M4 experiment logbook☆56Updated last year
- The official CLIP training codebase of Inf-CL: "Breaking the Memory Barrier: Near Infinite Batch Size Scaling for Contrastive Loss". A su…☆164Updated last week
- [CVPR2024] ViP-LLaVA: Making Large Multimodal Models Understand Arbitrary Visual Prompts☆294Updated 3 months ago
- MM-Vet: Evaluating Large Multimodal Models for Integrated Capabilities (ICML 2024)☆264Updated this week
- CuMo: Scaling Multimodal LLM with Co-Upcycled Mixture-of-Experts☆134Updated 5 months ago
- LLaVA-HR: High-Resolution Large Language-Vision Assistant☆213Updated 2 months ago
- Implementation of PALI3 from the paper PALI-3 VISION LANGUAGE MODELS: SMALLER, FASTER, STRONGER"☆144Updated this week
- Data release for the ImageInWords (IIW) paper.☆200Updated 5 months ago
- a family of highly capabale yet efficient large multimodal models☆161Updated 2 months ago
- [NeurIPS 2024] Official PyTorch implementation code for realizing the technical part of Mamba-based traversal of rationale (Meteor) to im…☆101Updated 5 months ago
- [NeurIPS'24 Spotlight] EVE: Encoder-Free Vision-Language Models☆227Updated last month
- Object Recognition as Next Token Prediction (CVPR 2024 Highlight)☆160Updated last month
- Multimodal language model benchmark, featuring challenging examples☆148Updated 2 months ago
- [NeurIPS 2024] Official implementation of the paper "Interfacing Foundation Models' Embeddings"☆110Updated 2 months ago
- Densely Captioned Images (DCI) dataset repository.☆158Updated 4 months ago
- Official Implementation for "MyVLM: Personalizing VLMs for User-Specific Queries" (ECCV 2024)☆150Updated 4 months ago
- Public code repo for paper "A Single Transformer for Scalable Vision-Language Modeling"☆113Updated last month