JieyuZ2 / TaskMeAnythingLinks

[NeurIPS 2024] A task generation and model evaluation system for multimodal language models.

☆73

Alternatives and similar repositories for TaskMeAnything

Users that are interested in TaskMeAnything are comparing it to the libraries listed below

Sorting:

yihedeng9 / OpenVLThinker
OpenVLThinker: An Early Exploration to Vision-Language Reasoning via Iterative Self-Improvement
☆122Updated 4 months ago
VisualWebBench / VisualWebBench
Evaluation framework for paper "VisualWebBench: How Far Have Multimodal LLMs Evolved in Web Page Understanding and Grounding?"
☆61Updated last year
shulin16 / MMInA
[ACL2025 Findings] Benchmarking Multihop Multimodal Internet Agents
☆47Updated 9 months ago
agents-x-project / PyVision
[MTI-LLM@NeurIPS 2025] Official implementation of "PyVision: Agentic Vision with Dynamic Tooling."
☆137Updated 4 months ago
TIGER-AI-Lab / MEGA-Bench
This repo contains the code for "MEGA-Bench Scaling Multimodal Evaluation to over 500 Real-World Tasks" [ICLR 2025]
☆77Updated 5 months ago
SalesforceAIResearch / LATTE
☆68Updated 2 months ago
orrzohar / Video-STaR
[ICLR 2025] Video-STaR: Self-Training Enables Video Instruction Tuning with Any Supervision
☆71Updated last year
kokolerk / TON
[NeurIPS 2025] Think or Not? Selective Reasoning via Reinforcement Learning for Vision-Language Models
☆51Updated 2 months ago
beichenzbc / BoostStep
official code for "BoostStep: Boosting mathematical capability of Large Language Models via improved single-step reasoning"
☆36Updated 10 months ago
yunfeixie233 / ViGaL
☆62Updated last month
xufangzhi / Genius
[ACL 2025] A Generalizable and Purely Unsupervised Self-Training Framework
☆70Updated 6 months ago
Wang-ML-Lab / multimodal-needle-in-a-haystack
[NAACL 2025 Oral] Multimodal Needle in a Haystack (MMNeedle): Benchmarking Long-Context Capability of Multimodal Large Language Models
☆50Updated 7 months ago
yihedeng9 / STIC
Enhancing Large Vision Language Models with Self-Training on Image Comprehension.
☆70Updated last year
TIGER-AI-Lab / VL-Rethinker
The official code of "VL-Rethinker: Incentivizing Self-Reflection of Vision-Language Models with Reinforcement Learning" [NeurIPS25]
☆168Updated 6 months ago
Yushi-Hu / VisualSketchpad
Codes for Visual Sketchpad: Sketching as a Visual Chain of Thought for Multimodal Language Models
☆269Updated 4 months ago
OpenGVLab / MMIU
[ICLR2025] MMIU: Multimodal Multi-image Understanding for Evaluating Large Vision-Language Models
☆90Updated last year
yuhui-zh15 / AutoConverter
Official implementation of "Automated Generation of Challenging Multiple-Choice Questions for Vision Language Model Evaluation" (CVPR 202…
☆39Updated 6 months ago
EvolvingLMMs-Lab / multimodal-sae
[ICCV 2025] Auto Interpretation Pipeline and many other functionalities for Multimodal SAE Analysis.
☆165Updated 2 months ago
callsys / GMPO
Geometric-Mean Policy Optimization
☆95Updated 2 weeks ago
facebookresearch / multimodal_rewardbench
Multimodal RewardBench
☆55Updated 9 months ago
hewei2001 / ReachQA
[EMNLP 2025] Distill Visual Chart Reasoning Ability from LLMs to MLLMs
☆57Updated 3 months ago
kxfan2002 / SophiaVL-R1
SophiaVL-R1: Reinforcing MLLMs Reasoning with Thinking Reward
☆87Updated 3 months ago
luka-group / vlm-knowledge-conflict
Code for paper "Unraveling Cross-Modality Knowledge Conflicts in Large Vision-Language Models."
☆49Updated last year
zeyofu / BLINK_Benchmark
This repo contains evaluation code for the paper "BLINK: Multimodal Large Language Models Can See but Not Perceive". https://arxiv.or…
☆150Updated 2 months ago
ASTRAL-Group / AlphaOne
[EMNLP 2025 Main] AlphaOne: Reasoning Models Thinking Slow and Fast at Test Time
☆86Updated 5 months ago
SparksJoe / Prism
A Framework for Decoupling and Assessing the Capabilities of VLMs
☆43Updated last year
RUCAIBox / Virgo
Official code of *Virgo: A Preliminary Exploration on Reproducing o1-like MLLM*
☆109Updated 6 months ago
zeyofu / ReFocus_Code
Codes for ReFocus: Visual Editing as a Chain of Thought for Structured Image Understanding [ICML 2025]]
☆43Updated 4 months ago
neulab / MultiUI
Code for Paper: Harnessing Webpage Uis For Text Rich Visual Understanding
☆53Updated 11 months ago
EvolvingLMMs-Lab / VideoMMMU
☆62Updated 3 months ago