JinjieNi / MixEval-X
The official github repo for MixEval-X, the first any-to-any, real-world benchmark.
☆14Updated 2 months ago
Alternatives and similar repositories for MixEval-X:
Users that are interested in MixEval-X are comparing it to the libraries listed below
- Official implementation of the paper "MMInA: Benchmarking Multihop Multimodal Internet Agents"☆42Updated 2 months ago
- ☆44Updated last week
- Code for paper: Unified Text-to-Image Generation and Retrieval☆15Updated 10 months ago
- ☆24Updated 3 weeks ago
- ☆16Updated 9 months ago
- This is the implementation of CounterCurate, the data curation pipeline of both physical and semantic counterfactual image-caption pairs.☆18Updated 10 months ago
- The code for "VISTA: Enhancing Long-Duration and High-Resolution Video Understanding by VIdeo SpatioTemporal Augmentation" [CVPR2025]☆15Updated 2 months ago
- PyTorch implementation of StableMask (ICML'24)☆12Updated 10 months ago
- VPEval Codebase from Visual Programming for Text-to-Image Generation and Evaluation (NeurIPS 2023)☆44Updated last year
- [NAACL 2025 Oral] Multimodal Needle in a Haystack (MMNeedle): Benchmarking Long-Context Capability of Multimodal Large Language Models☆42Updated last week
- This repo contains code for the paper "Both Text and Images Leaked! A Systematic Analysis of Data Contamination in Multimodal LLM"☆13Updated last month
- [ICLR 2025] LongPO: Long Context Self-Evolution of Large Language Models through Short-to-Long Preference Optimization☆35Updated 2 months ago
- MIO: A Foundation Model on Multimodal Tokens☆25Updated 4 months ago
- This repo contains code and data for ICLR 2025 paper MIA-Bench: Towards Better Instruction Following Evaluation of Multimodal LLMs☆31Updated last month
- Official code and dataset for our EMNLP 2024 Findings paper: Stark: Social Long-Term Multi-Modal Conversation with Persona Commonsense Kn…☆19Updated 4 months ago
- Preference Learning for LLaVA☆44Updated 5 months ago
- Code for Blog Post: Can Better Cold-Start Strategies Improve RL Training for LLMs?☆16Updated last month
- Official implementation of our paper "Finetuned Multimodal Language Models are High-Quality Image-Text Data Filters".☆55Updated 3 weeks ago
- ☆10Updated 3 months ago
- ☆10Updated last week
- Code and data for the paper: Learning Action and Reasoning-Centric Image Editing from Videos and Simulation☆28Updated 3 months ago
- My implementation of the model KosmosG from "KOSMOS-G: Generating Images in Context with Multimodal Large Language Models"☆14Updated 5 months ago
- [NAACL 2025] Source code for MMEvalPro, a more trustworthy and efficient benchmark for evaluating LMMs☆24Updated 7 months ago
- Official implementation of ECCV24 paper: POA☆24Updated 8 months ago
- ☆10Updated last year
- [ICLR 2025] Source code for paper "A Spark of Vision-Language Intelligence: 2-Dimensional Autoregressive Transformer for Efficient Finegr…☆75Updated 4 months ago
- [TMLR 2024] Official implementation of "Sight Beyond Text: Multi-Modal Training Enhances LLMs in Truthfulness and Ethics"☆19Updated last year
- [NAACL 2025] Representing Rule-based Chatbots with Transformers☆20Updated 2 months ago
- The official repo for "VisualWebInstruct: Scaling up Multimodal Instruction Data through Web Search"☆24Updated this week
- Official implementation for "MJ-Bench: Is Your Multimodal Reward Model Really a Good Judge for Text-to-Image Generation?"☆43Updated 2 months ago