JinjieNi / MixEval-XLinks

The official github repo for MixEval-X, the first any-to-any, real-world benchmark.

☆14

Alternatives and similar repositories for MixEval-X

Users that are interested in MixEval-X are comparing it to the libraries listed below

Sorting:

shulin16 / MMInA
[ACL2025 Findings] Benchmarking Multihop Multimodal Internet Agents
☆46Updated 5 months ago
yale-nlp / refdpo
☆16Updated last year
Victorwz / MLM_Filter
Official implementation of our paper "Finetuned Multimodal Language Models are High-Quality Image-Text Data Filters".
☆63Updated 3 months ago
passing2961 / Stark
Official code and dataset for our EMNLP 2024 Findings paper: Stark: Social Long-Term Multi-Modal Conversation with Persona Commonsense Kn…
☆19Updated 7 months ago
jdf-prog / LLM-Engines
☆50Updated 2 months ago
IntelLabs / multimodal_cognitive_ai
research work on multimodal cognitive ai
☆65Updated last month
jialuli-luka / SELMA
Code and Data for Paper: SELMA: Learning and Merging Skill-Specific Text-to-Image Experts with Auto-Generated Data
☆34Updated last year
hila-chefer / Conceptor
Official implementation of the paper The Hidden Language of Diffusion Models
☆74Updated last year
Qichuzyy / POA
Official implementation of ECCV24 paper: POA
☆24Updated last year
chenllliang / DnD-Transformer
[ICLR 2025] Source code for paper "A Spark of Vision-Language Intelligence: 2-Dimensional Autoregressive Transformer for Efficient Finegr…
☆76Updated 7 months ago
YuxiXie / V-DPO
Preference Learning for LLaVA
☆47Updated 9 months ago
wmn-231314 / diffusion-data-constraint
☆41Updated 2 weeks ago
wade3han / champagne
An official codebase for paper " CHAMPAGNE: Learning Real-world Conversation from Large-Scale Web Videos (ICCV 23)"
☆52Updated last year
kokolerk / TON
Think or Not? Selective Reasoning via Reinforcement Learning for Vision-Language Models
☆40Updated 3 weeks ago
SparksJoe / Prism
A Framework for Decoupling and Assessing the Capabilities of VLMs
☆44Updated last year
Wang-ML-Lab / multimodal-needle-in-a-haystack
[NAACL 2025 Oral] Multimodal Needle in a Haystack (MMNeedle): Benchmarking Long-Context Capability of Multimodal Large Language Models
☆48Updated 3 months ago
beichenzbc / BoostStep
official code for "BoostStep: Boosting mathematical capability of Large Language Models via improved single-step reasoning"
☆36Updated 6 months ago
sail-sg / scaling-with-vocab
[NeurIPS-2024] 📈 Scaling Laws with Vocabulary: Larger Models Deserve Larger Vocabularies https://arxiv.org/abs/2407.13623
☆86Updated 10 months ago
abhi1nandy2 / yesbut_dataset
YesBut - Multimodal Satire Comprehension Dataset
☆18Updated 9 months ago
mj-storytelling / DiversityTuning
Modifying Large Language Models Post-training for Diverse Creative Writing
☆46Updated 2 months ago
open-compass / GPassK
[ACL 2025] Are Your LLMs Capable of Stable Reasoning?
☆29Updated this week
apple / ml-mia-bench
This repo contains code and data for ICLR 2025 paper MIA-Bench: Towards Better Instruction Following Evaluation of Multimodal LLMs
☆31Updated 5 months ago
mlfoundations / VisIT-Bench
☆50Updated last year
tianyi-lab / C3PO
Code for "C3PO: Critical-Layer, Core-Expert, Collaborative Pathway Optimization for Test-Time Expert Re-Mixing"
☆17Updated 4 months ago
ZJU-REAL / LAPO
☆26Updated 2 weeks ago
liziniu / cold_start_rl
Code for Blog Post: Can Better Cold-Start Strategies Improve RL Training for LLMs?
☆17Updated 5 months ago
psunlpgroup / VisOnlyQA
This repository contains the code and data for the paper "VisOnlyQA: Large Vision Language Models Still Struggle with Visual Perception o…
☆24Updated last month
MLLM-Data-Contamination / MM-Detect
This repo contains code for the paper "Both Text and Images Leaked! A Systematic Analysis of Data Contamination in Multimodal LLM"
☆16Updated last month
ZJU-REAL / Self-Braking-Tuning
Code for Let LLMs Break Free from Overthinking via Self-Braking Tuning. https://arxiv.org/abs/2505.14604
☆45Updated last week
leezythu / FocusLLM
FocusLLM: Scaling LLM’s Context by Parallel Decoding
☆43Updated 8 months ago