vlm2-bench/VLM2-Bench

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/vlm2-bench/VLM2-Bench)

vlm2-bench / VLM2-Bench

VLM2-Bench [ACL 2025 Main]: A Closer Look at How Well VLMs Implicitly Link Explicit Matching Visual Cues

☆45

Alternatives and similar repositories for VLM2-Bench

Users that are interested in VLM2-Bench are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

JiayuJeff / CostBench
View on GitHub
The official code repository for the paper "CostBench: Evaluating Multi-Turn Cost-Optimal Planning and Adaptation in Dynamic Environments…
☆33Jun 14, 2026Updated last month
JiayuJeff / PlanBench-XL
View on GitHub
Official Repository for our paper: PlanBench-XL: Evaluating Long-Horizon Planning of LLM Tool-Use Agents in Large-Scale Tool Ecosystems
☆38Jul 16, 2026Updated last week
YangHaolin0526 / MARS-SQL
View on GitHub
☆43Dec 19, 2025Updated 7 months ago
RickySkywalker / LeanOfThought-Official
View on GitHub
This is the official implementation for MA-LoT.
☆20Aug 4, 2025Updated 11 months ago
WenkeHuang / HeteroFL
View on GitHub
Benchmark for Hetergeneous Federated Learning by MARS Group at the Wuhan University, led by Prof. Mang Ye.
☆19May 29, 2023Updated 3 years ago
AI Agents on DigitalOcean Gradient AI Platform • Ad
Build production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
lukahhcm / Awesome_Environment_Scaling
View on GitHub
Resources and paper list for 'Scaling Environments for Agents'. This repository accompanies our survey on how environments contribute to …
☆72Jan 28, 2026Updated 5 months ago
G-JWLee / TAMP
View on GitHub
☆12May 15, 2025Updated last year
google / haloquest
View on GitHub
☆25Aug 2, 2024Updated last year
seq-to-mind / planning_dial_summ
View on GitHub
One implementation of the paper "Controllable Neural Dialogue Summarization with Personal Named Entity Planning" (EMNLP 2022).
☆18Nov 9, 2023Updated 2 years ago
ShujinWu-0814 / MACAROON
View on GitHub
Public code repo for EMNLP 2024 Findings paper "MACAROON: Training Vision-Language Models To Be Your Engaged Partners"
☆14Sep 28, 2024Updated last year
XSkill-Agent / XSkill
View on GitHub
[ICML 2026] XSkill: Continual Learning from Experience and Skills in Multimodal Agents
☆239May 13, 2026Updated 2 months ago
jinzhuoran / RAG-RewardBench
View on GitHub
RAG-RewardBench: Benchmarking Reward Models in Retrieval Augmented Generation for Preference Alignment
☆18Dec 19, 2024Updated last year
SKURA502 / sae-analysis
View on GitHub
A toolkit for systematically understanding the concepts encoded in Sparse Autoencoders.
☆20Apr 5, 2026Updated 3 months ago
hkust-nlp / AgentVista
View on GitHub
Benchmarking multimodal agents on realistic, ultra-challenging visual scenarios requiring long-horizon hybrid tool use.
☆67Mar 10, 2026Updated 4 months ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
LiYu0524 / ATbench
View on GitHub
ATBench: A Diverse and Realistic Agent Trajectory Benchmark for Safety Evaluation and Diagnosis
☆33Jul 10, 2026Updated 2 weeks ago
Bala93 / CLIPCalib
View on GitHub
Model calibration in CLIP Adapters
☆20Aug 19, 2024Updated last year
luka-group / vlm-knowledge-conflict
View on GitHub
Code for paper "Unraveling Cross-Modality Knowledge Conflicts in Large Vision-Language Models."
☆54Oct 19, 2024Updated last year
boyiwei / CoTaEval
View on GitHub
[NeurIPS 2024 D&B] Evaluating Copyright Takedown Methods for Language Models
☆17Jul 17, 2024Updated 2 years ago
Sphere-AI-Lab / FormalMATH-Bench
View on GitHub
Repository of <FormalMATH: Benchmarking Formal Mathematical Reasoning of Large Language Models>
☆75Jan 8, 2026Updated 6 months ago
sterzhang / PVIT
View on GitHub
Official Repository of Personalized Visual Instruct Tuning
☆34Mar 6, 2025Updated last year
sunblaze-ucb / cybergym-e2e
View on GitHub
CyberGym-E2E is a large-scale benchmark built from real-world vulnerabilities in widely used open-source projects to evaluate AI agents' …
☆29Jun 25, 2026Updated 3 weeks ago
jczhang02 / MUSIC_dataset_script
View on GitHub
This repo contains script to download MUSIC dataset from youtube
☆12Jan 19, 2024Updated 2 years ago
HKUST-KnowComp / IntentionQA
View on GitHub
Code and data for the paper: IntentionQA: A Benchmark for Evaluating Purchase Intention Comprehension Abilities of Large Language Models …
☆12Apr 27, 2024Updated 2 years ago
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
Zhaoyi-Li21 / creme
View on GitHub
[ACL 2024] "Understanding and Patching Compositional Reasoning in LLMs"
☆14Aug 28, 2024Updated last year
ys-zong / VLGuard
View on GitHub
[ICML 2024] Safety Fine-Tuning at (Almost) No Cost: A Baseline for Vision Large Language Models.
☆90Jan 19, 2025Updated last year
Yangyi-Chen / CoTConsistency
View on GitHub
The released data for paper "Measuring and Improving Chain-of-Thought Reasoning in Vision-Language Models".
☆34Sep 16, 2023Updated 2 years ago
sterzhang / image-textualization
View on GitHub
Image Textualization: An Automatic Framework for Generating Rich and Detailed Image Descriptions (NeurIPS 2024)
☆172Jul 30, 2024Updated last year
TaiMingLu / know-dont-tell
View on GitHub
☆19Oct 14, 2024Updated last year
Zhitao-He / AgentsCourt
View on GitHub
AgentsCourt: Building Judicial Decision-Making Agents with Court Debate Simulation and Legal Knowledge Augmentation (EMNLP 2024 Findings)
☆18Dec 30, 2024Updated last year
hongchengzhu / VoxTracer
View on GitHub
Official Implementation of VoxTracer (MM' 23)
☆12Oct 27, 2023Updated 2 years ago
mlvlab / ProMetaR
View on GitHub
Official implementation of CVPR 2024 paper "Prompt Learning via Meta-Regularization".
☆31Mar 10, 2025Updated last year
jinzhuoran / MiNer
View on GitHub
A Good Neighbor, A Found Treasure: Mining Treasured Neighbors for Knowledge Graph Entity Typing. EMNLP 2022
☆11Feb 1, 2023Updated 3 years ago
Managed Database hosting by DigitalOcean • Ad
PostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
StarDewXXX / UltraHorizon
View on GitHub
Benchmarking Agent Capabilities in Ultra Long-Horizon Scenarios
☆27Sep 30, 2025Updated 9 months ago
Bookmaster9 / kNN-latentMAS
View on GitHub
LatentMAS with kNN kv cache pruning | up to 40% more memory efficient and 30% faster
☆19Dec 10, 2025Updated 7 months ago
cywinski / guide
View on GitHub
☆13Jan 16, 2025Updated last year
Vekteur / probabilistic-calibration-study
View on GitHub
Implementation of "A Large-Scale Study of Probabilistic Calibration in Neural Network Regression" (ICML 2023)
☆11Oct 7, 2025Updated 9 months ago
Vipermdl / E2E-ERFNet
View on GitHub
The re-implementation of <End-to-End Lane Marker Detection via Row-wise Classification>
☆14Sep 21, 2020Updated 5 years ago
EsYoon7 / RLHF-TLCR
View on GitHub
[ACL'24 Findings] Official code for "TLCR: Token-Level Continuous Reward for Fine-grained Reinforcement Learning from Human Feedback"
☆12Dec 6, 2024Updated last year
VIM-Bench / VIM_TOOL
View on GitHub
☆12Jun 12, 2024Updated 2 years ago