SparksJoe / PrismLinks

A Framework for Decoupling and Assessing the Capabilities of VLMs

☆43

Alternatives and similar repositories for Prism

Users that are interested in Prism are comparing it to the libraries listed below

Sorting:

MBZUAI-LLM / web2code
Web2Code: A Large-scale Webpage-to-Code Dataset and Evaluation Framework for Multimodal LLMs
☆91Updated last year
FudanNLPLAB / MouSi
☆74Updated last year
facebookresearch / multimodal_rewardbench
Multimodal RewardBench
☆54Updated 8 months ago
RUCAIBox / Virgo
Official code of *Virgo: A Preliminary Exploration on Reproducing o1-like MLLM*
☆109Updated 4 months ago
mlfoundations / VisIT-Bench
☆50Updated last year
jihaonew / MM-Instruct
MM-Instruct: Generated Visual Instructions for Large Multimodal Model Alignment
☆35Updated last year
shulin16 / MMInA
[ACL2025 Findings] Benchmarking Multihop Multimodal Internet Agents
☆46Updated 7 months ago
agents-x-project / PyVision
Official implementation of "PyVision: Agentic Vision with Dynamic Tooling."
☆128Updated 3 months ago
chenllliang / G1
G1: Bootstrapping Perception and Reasoning Abilities of Vision-Language Model via Reinforcement Learning
☆85Updated 5 months ago
icoz69 / StableLLAVA
Official repo for StableLLAVA
☆94Updated last year
si0wang / ThinkLite-VL
☆101Updated 4 months ago
EvolvingLMMs-Lab / VideoMMMU
☆60Updated last month
zwq2018 / Multi-modal-Self-instruct
The codebase for our EMNLP24 paper: Multimodal Self-Instruct: Synthetic Abstract Image and Visual Reasoning Instruction Using Language Mo…
☆83Updated 8 months ago
TIGER-AI-Lab / MEGA-Bench
This repo contains the code for "MEGA-Bench Scaling Multimodal Evaluation to over 500 Real-World Tasks" [ICLR 2025]
☆77Updated 3 months ago
beichenzbc / BoostStep
official code for "BoostStep: Boosting mathematical capability of Large Language Models via improved single-step reasoning"
☆36Updated 9 months ago
kxfan2002 / SophiaVL-R1
SophiaVL-R1: Reinforcing MLLMs Reasoning with Thinking Reward
☆84Updated 2 months ago
OpenGVLab / V2PE
[ArXiv] V2PE: Improving Multimodal Long-Context Capability of Vision-Language Models with Variable Visual Position Encoding
☆57Updated 10 months ago
callsys / GMPO
Geometric-Mean Policy Optimization
☆86Updated last week
Yangyi-Chen / SOLO
[TMLR] Public code repo for paper "A Single Transformer for Scalable Vision-Language Modeling"
☆148Updated 11 months ago
RifleZhang / LLaVA-Reasoner-DPO
☆95Updated 9 months ago
OpenGVLab / MMIU
[ICLR2025] MMIU: Multimodal Multi-image Understanding for Evaluating Large Vision-Language Models
☆87Updated last year
MengLcool / DeepStack-VL
[NeurIPS-24] This is the official implementation of the paper "DeepStack: Deeply Stacking Visual Tokens is Surprisingly Simple and Effect…
☆61Updated last year
apple / ml-mia-bench
This repo contains code and data for ICLR 2025 paper MIA-Bench: Towards Better Instruction Following Evaluation of Multimodal LLMs
☆31Updated 7 months ago
yuecao0119 / MMInstruct
[SCIS 2024] The official implementation of the paper "MMInstruct: A High-Quality Multi-Modal Instruction Tuning Dataset with Extensive Di…
☆59Updated 11 months ago
OpenGVLab / MM-NIAH
[NeurIPS 2024] Needle In A Multimodal Haystack (MM-NIAH): A comprehensive benchmark designed to systematically evaluate the capability of…
☆115Updated 11 months ago
yfzhang114 / SliME
✨✨Beyond LLaVA-HD: Diving into High-Resolution Large Multimodal Models
☆162Updated 9 months ago
kokolerk / TON
[NeurIPS 2025] Think or Not? Selective Reasoning via Reinforcement Learning for Vision-Language Models
☆47Updated 3 weeks ago
OFA-Sys / TouchStone
Touchstone: Evaluating Vision-Language Models by Language Models
☆83Updated last year
RenShuhuai-Andy / TESTA
[EMNLP 2023] TESTA: Temporal-Spatial Token Aggregation for Long-form Video-Language Understanding
☆50Updated last year
360CVGroup / Inner-Adaptor-Architecture
LMM solved catastrophic forgetting, AAAI2025
☆44Updated 6 months ago