JinjieNi / MixEval-X
The official github repo for MixEval-X, the first any-to-any, real-world benchmark.
☆12Updated last month
Alternatives and similar repositories for MixEval-X:
Users that are interested in MixEval-X are comparing it to the libraries listed below
- Official implementation of the paper "MMInA: Benchmarking Multihop Multimodal Internet Agents"☆41Updated this week
- ☆41Updated 3 weeks ago
- FocusLLM: Scaling LLM’s Context by Parallel Decoding☆34Updated last month
- ☆23Updated 2 months ago
- Implementation and dataset for paper "Can MLLMs Perform Text-to-Image In-Context Learning?"☆31Updated 2 months ago
- This is the implementation of CounterCurate, the data curation pipeline of both physical and semantic counterfactual image-caption pairs.☆17Updated 7 months ago
- Code and data for the paper: Learning Action and Reasoning-Centric Image Editing from Videos and Simulation☆22Updated 2 weeks ago
- Code and Data for Paper: SELMA: Learning and Merging Skill-Specific Text-to-Image Experts with Auto-Generated Data☆33Updated 10 months ago
- Source code for paper "A Spark of Vision-Language Intelligence: 2-Dimensional Autoregressive Transformer for Efficient Finegrained Image …☆63Updated last month
- Official repository for ICML 2024 paper "MoRe Fine-Tuning with 10x Fewer Parameters"☆17Updated 3 weeks ago
- This repo contains code for "VISTA: Enhancing Long-Duration and High-Resolution Video Understanding by VIdeo SpatioTemporal Augmentation"☆11Updated 3 weeks ago
- ☆15Updated 6 months ago
- Official implementation of ECCV24 paper: POA☆24Updated 5 months ago
- Preference Learning for LLaVA☆35Updated 2 months ago
- A minimal re-implementation of orthogonal fine-tuning (OFT) for LLMs. Based on nanoGPT and minLoRA.☆12Updated last year
- Official implementation of our paper "Finetuned Multimodal Language Models are High-Quality Image-Text Data Filters".☆44Updated last month
- Code repo for "Read Anywhere Pointed: Layout-aware GUI Screen Reading with Tree-of-Lens Grounding"☆24Updated 5 months ago
- Representing Rule-based Chatbots with Transformers☆19Updated 6 months ago
- [EMNLP 2024 Findings] ProSA: Assessing and Understanding the Prompt Sensitivity of LLMs☆22Updated 3 months ago
- Improving Language Understanding from Screenshots. Paper: https://arxiv.org/abs/2402.14073☆26Updated 6 months ago
- Official implementation of the paper The Hidden Language of Diffusion Models☆69Updated last year
- ☆12Updated 3 weeks ago
- This repo contains the code for "MEGA-Bench Scaling Multimodal Evaluation to over 500 Real-World Tasks" [ICLR2025]☆54Updated this week
- Official implementation for "MJ-Bench: Is Your Multimodal Reward Model Really a Good Judge for Text-to-Image Generation?"☆41Updated 2 months ago
- My implementation of the model KosmosG from "KOSMOS-G: Generating Images in Context with Multimodal Large Language Models"☆14Updated 2 months ago
- [NeurIPS-2024] 📈 Scaling Laws with Vocabulary: Larger Models Deserve Larger Vocabularies https://arxiv.org/abs/2407.13623☆76Updated 4 months ago
- ☆28Updated 11 months ago
- Evaluation framework for paper "VisualWebBench: How Far Have Multimodal LLMs Evolved in Web Page Understanding and Grounding?"☆48Updated 3 months ago
- imagetokenizer is a python package, helps you encoder visuals and generate visuals token ids from codebook, supports both image and video…☆30Updated 7 months ago
- PyTorch implementation of StableMask (ICML'24)☆12Updated 7 months ago