mbzuai-oryx / ALM-Bench
π₯ ALM-Bench is a multilingual multi-modal diverse cultural benchmark for 100 languages across 19 categories. It assesses the next generation of LMMs on cultural inclusitivity.
β28Updated last week
Alternatives and similar repositories for ALM-Bench:
Users that are interested in ALM-Bench are comparing it to the libraries listed below
- (WACV 2025 - Oral) Vision-language conversation in 10 languages including English, Chinese, French, Spanish, Russian, Japanese, Arabic, Hβ¦β82Updated this week
- This repo contains evaluation code for the paper "BLINK: Multimodal Large Language Models Can See but Not Perceive". https://arxiv.orβ¦β116Updated 7 months ago
- Unsolvable Problem Detection: Evaluating Trustworthiness of Vision Language Modelsβ73Updated 5 months ago
- Preference Learning for LLaVAβ37Updated 3 months ago
- β39Updated 6 months ago
- Code and data for the paper: Learning Action and Reasoning-Centric Image Editing from Videos and Simulationβ24Updated last month
- [WACV 2025] Official implementation of "Online-LoRA: Task-free Online Continual Learning via Low Rank Adaptation" by Xiwen Wei, Guihong Lβ¦β31Updated 3 months ago
- Official implementation and dataset for the NAACL 2024 paper "ComCLIP: Training-Free Compositional Image and Text Matching"β35Updated 6 months ago
- [ICLR 2025] Video-STaR: Self-Training Enables Video Instruction Tuning with Any Supervisionβ59Updated 7 months ago
- ViLMA: A Zero-Shot Benchmark for Linguistic and Temporal Grounding in Video-Language Models (ICLR 2024, Official Implementation)β14Updated last year
- π€ [ICLR'25] Multimodal Video Understanding Framework (MVU)β27Updated 3 weeks ago
- Official Pytorch implementation of "Interpreting and Editing Vision-Language Representations to Mitigate Hallucinations" (ICLR '25)β59Updated 3 weeks ago
- β31Updated last year
- OLA-VLM: Elevating Visual Perception in Multimodal LLMs with Auxiliary Embedding Distillation, arXiv 2024β49Updated 3 weeks ago
- β16Updated 3 months ago
- Holistic evaluation of multimodal foundation modelsβ42Updated 6 months ago
- β36Updated 7 months ago
- Towards Evaluating the Robustness of Visual State Space Modelsβ24Updated 5 months ago
- Code and datasets for "Whatβs βupβ with vision-language models? Investigating their struggle with spatial reasoning".β40Updated 11 months ago
- Code for "Enhancing In-context Learning via Linear Probe Calibration"β35Updated 9 months ago
- Official implementation of the paper "MMInA: Benchmarking Multihop Multimodal Internet Agents"β41Updated last week
- Official implementation of the paper "STEREO: Towards Adversarially Robust Concept Erasing from Text-to-Image Generation Models"β15Updated 5 months ago
- Auto Interpretation Pipeline and many other functionalities for Multimodal SAE Analysis.β104Updated 3 weeks ago
- Code for NOLA, an implementation of "nola: Compressing LoRA using Linear Combination of Random Basis"β51Updated 5 months ago
- [Arxiv] Aligning Modalities in Vision Large Language Models via Preference Fine-tuningβ79Updated 9 months ago
- [EMNLP 2024 Findings] ProSA: Assessing and Understanding the Prompt Sensitivity of LLMsβ24Updated 3 months ago
- Official implementation of ECCV24 paper: POAβ24Updated 6 months ago
- Do Vision and Language Models Share Concepts? A Vector Space Alignment Studyβ14Updated 2 months ago
- [ICLR 2025] Source code for paper "A Spark of Vision-Language Intelligence: 2-Dimensional Autoregressive Transformer for Efficient Finegrβ¦β67Updated 2 months ago