This repo contains code and data for ICLR 2025 paper MIA-Bench: Towards Better Instruction Following Evaluation of Multimodal LLMs
☆39Mar 9, 2025Updated last year
Alternatives and similar repositories for ml-mia-bench
Users that are interested in ml-mia-bench are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Marathon: A Multiple-choice Long Context Evaluation Benchmark for Large Language Models.☆10May 16, 2024Updated 2 years ago
- [ICLR 2026] Adaptive Social Learning via Mode Policy Optimization for Language Agents☆51Feb 2, 2026Updated 4 months ago
- ☆31Sep 12, 2025Updated 9 months ago
- ☆13Jul 10, 2024Updated last year
- ☆28Oct 28, 2024Updated last year
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- [ACL 2024 (Oral)] A Prospector of Long-Dependency Data for Large Language Models☆60Jul 23, 2024Updated last year
- Awesome multi-modal large language paper/project, collections of popular training strategies, e.g., PEFT, LoRA.☆27Aug 2, 2024Updated last year
- LongAttn :Selecting Long-context Training Data via Token-level Attention☆15Jul 16, 2025Updated 11 months ago
- Developer project for getting basic API integrations working in under 5 minutes☆11May 22, 2026Updated 3 weeks ago
- Paper list of compositional zero-shot learning☆11Jul 5, 2022Updated 3 years ago
- GPT Demo with hybrid distributed training☆10Dec 1, 2022Updated 3 years ago
- Follow-Up Differential Descriptions: Language Models Resolve Ambiguities for Image Classification☆11Nov 15, 2023Updated 2 years ago
- A dataset of scientific vector graphics in TikZ for training generative models.☆27Feb 4, 2026Updated 4 months ago
- Learning Safety Constraints for Large Language Models (ICML2025)☆35May 25, 2026Updated 3 weeks ago
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- [ICLR'25 Oral] MMIE: Massive Multimodal Interleaved Comprehension Benchmark for Large Vision-Language Models☆36Nov 3, 2024Updated last year
- ☆17Feb 22, 2024Updated 2 years ago
- ☆31May 21, 2026Updated 3 weeks ago
- Confidence Regulation Neurons in Language Models (NeurIPS 2024)☆15Feb 1, 2025Updated last year
- LLM - Detect AI Generated Text || Identify which essay was written by a large language model☆17Jan 17, 2024Updated 2 years ago
- official implementation of ICLR'2025 paper: Rethinking Bradley-Terry Models in Preference-based Reward Modeling: Foundations, Theory, and…☆72Apr 2, 2025Updated last year
- MMSci: A Multimodal Multi-Discipline Dataset for PhD-Level Scientific Comprehension☆52Dec 3, 2024Updated last year
- [NeurIPS 2025] L-MTP: Leap Multi-Token Prediction Beyond Adjacent Context for Large Language Models☆28May 8, 2026Updated last month
- Multimodal RewardBench☆68Feb 21, 2025Updated last year
- End-to-end encrypted cloud storage - Proton Drive • AdSpecial offer: 40% Off Yearly / 80% Off First Month. Protect your most important files, photos, and documents from prying eyes.
- [TCSVT23] Official code for "SPT: Spatial Pyramid Transformer for Image Captioning".☆10Aug 14, 2024Updated last year
- ☆23Apr 3, 2025Updated last year
- MTVQA: Benchmarking Multilingual Text-Centric Visual Question Answering. A comprehensive evaluation of multimodal large model multilingua…☆64May 15, 2025Updated last year
- See the device (CPU/GPU/ANE) and estimated cost for every layer in your CoreML model.☆25Oct 23, 2025Updated 7 months ago
- Evaluate gpt-4o on CLIcK (Korean NLP Dataset)☆20May 18, 2024Updated 2 years ago
- An up-to-date list of works on Multi-domain Multi-task learning☆18Oct 20, 2022Updated 3 years ago
- ☆19Oct 28, 2025Updated 7 months ago
- [IJCNN 2024] Multi-Objective Optimization for Sparse Deep Multi-Task Learning☆16May 22, 2025Updated last year
- [CVPR 2023] Code for the paper "Masked Images Are Counterfactual Samples for Robust Fine-tuning"☆14Mar 24, 2023Updated 3 years ago
- End-to-end encrypted email - Proton Mail • AdSpecial offer: 40% Off Yearly / 80% Off First Month. All Proton services are open source and independently audited for security.
- INDICT: Code Generation with Internal Dialogues of Critiques for Both Security and Helpfulness☆15Jun 2, 2026Updated 2 weeks ago
- ☆18Jun 3, 2024Updated 2 years ago
- [NeurIPS 2024] A Novel Rank-Based Metric for Evaluating Large Language Models☆59May 28, 2025Updated last year
- WritingBench: A Comprehensive Benchmark for Generative Writing☆182Dec 19, 2025Updated 5 months ago
- ☆11Jan 16, 2025Updated last year
- 强化学习课程,主要是如何用强化学习解决问题☆15Dec 10, 2024Updated last year
- [ACL 2024 Findings & ICLR 2024 WS] An Evaluator VLM that is open-source, offers reproducible evaluation, and inexpensive to use. Specific…☆86Sep 13, 2024Updated last year