prometheus-eval / prometheus-visionLinks

[ACL 2024 Findings & ICLR 2024 WS] An Evaluator VLM that is open-source, offers reproducible evaluation, and inexpensive to use. Specifically designed for fine-grained evaluation on customized score rubric, Prometheus-Vision is a good alternative for human evaluation and GPT-4V evaluation.

☆78

Alternatives and similar repositories for prometheus-vision

Users that are interested in prometheus-vision are comparing it to the libraries listed below

Sorting:

kaistAI / Volcano
[NAACL 2024] Vision language model that reduces hallucinations through self-feedback guided revision. Visualizes attentions on image feat…
☆46Updated last year
naver-ai / model-stock
Model Stock: All we need is just a few fine-tuned models
☆125Updated 2 months ago
ByungKwanLee / TroL
[EMNLP 2024] Official PyTorch implementation code for realizing the technical part of Traversal of Layers (TroL) presenting new propagati…
☆98Updated last year
Victorwz / MLM_Filter
Official implementation of our paper "Finetuned Multimodal Language Models are High-Quality Image-Text Data Filters".
☆67Updated 6 months ago
ByungKwanLee / Meteor
[NeurIPS 2024] Official PyTorch implementation code for realizing the technical part of Mamba-based traversal of rationale (Meteor) to im…
☆115Updated last year
alinlab / HOMER
Official implementation of Hierarchical Context Merging: Better Long Context Understanding for Pre-trained LLMs (ICLR 2024).
☆43Updated last year
psunlpgroup / VisOnlyQA
This repository contains the code and data for the paper "VisOnlyQA: Large Vision Language Models Still Struggle with Visual Perception o…
☆27Updated 3 months ago
mu-cai / matryoshka-mm
Matryoshka Multimodal Models
☆114Updated 9 months ago
YuxiXie / V-DPO
Preference Learning for LLaVA
☆51Updated 11 months ago
anguyen8 / vision-llms-are-blind
☆135Updated 2 months ago
YiyangZhou / POVID
[Arxiv] Aligning Modalities in Vision Large Language Models via Preference Fine-tuning
☆88Updated last year
gregor-ge / mBLIP
☆87Updated last year
sanjayss34 / codevqa
☆84Updated 2 years ago
facebookresearch / unibench
Python Library to evaluate VLM models' robustness across diverse benchmarks
☆217Updated last week
zeyofu / BLINK_Benchmark
This repo contains evaluation code for the paper "BLINK: Multimodal Large Language Models Can See but Not Perceive". https://arxiv.or…
☆147Updated last month
mlfoundations / VisIT-Bench
☆50Updated 2 years ago
VisualWebBench / VisualWebBench
Evaluation framework for paper "VisualWebBench: How Far Have Multimodal LLMs Evolved in Web Page Understanding and Grounding?"
☆59Updated last year
ncsoft / offsetbias
Official implementation of "OffsetBias: Leveraging Debiased Data for Tuning Evaluators"
☆25Updated last year
apple / ml-rpm-bench
☆41Updated last year
luka-group / vlm-knowledge-conflict
Code for paper "Unraveling Cross-Modality Knowledge Conflicts in Large Vision-Language Models."
☆48Updated last year
mbzuai-oryx / PALO
(WACV 2025 - Oral) Vision-language conversation in 10 languages including English, Chinese, French, Spanish, Russian, Japanese, Arabic, H…
☆82Updated 2 months ago
multimodal-interpretability / maia
Official implementation of MAIA, A Multimodal Automated Interpretability Agent
☆92Updated last week
google / haloquest
☆23Updated last year
kaistAI / Janus
[NeurIPS 2024] Train LLMs with diverse system messages reflecting individualized preferences to generalize to unseen system messages
☆51Updated 2 months ago
kernelmachine / silo-lm
SILO Language Models code repository
☆83Updated last year
wade3han / champagne
An official codebase for paper " CHAMPAGNE: Learning Real-world Conversation from Large-Scale Web Videos (ICCV 23)"
☆52Updated 2 years ago
SHI-Labs / CuMo
CuMo: Scaling Multimodal LLM with Co-Upcycled Mixture-of-Experts
☆157Updated last year
WildVision-AI / WildVision-Bench
☆16Updated last year
huggingface / OBELICS
Code used for the creation of OBELICS, an open, massive and curated collection of interleaved image-text web documents, containing 141M d…
☆206Updated last year
Wang-ML-Lab / multimodal-needle-in-a-haystack
[NAACL 2025 Oral] Multimodal Needle in a Haystack (MMNeedle): Benchmarking Long-Context Capability of Multimodal Large Language Models
☆49Updated 5 months ago