mbzuai-oryx / ALM-Bench
[CVPR 2025 π₯] ALM-Bench is a multilingual multi-modal diverse cultural benchmark for 100 languages across 19 categories. It assesses the next generation of LMMs on cultural inclusitivity.
β37Updated last week
Alternatives and similar repositories for ALM-Bench:
Users that are interested in ALM-Bench are comparing it to the libraries listed below
- (WACV 2025 - Oral) Vision-language conversation in 10 languages including English, Chinese, French, Spanish, Russian, Japanese, Arabic, Hβ¦β84Updated 2 months ago
- Unsolvable Problem Detection: Evaluating Trustworthiness of Vision Language Modelsβ76Updated 7 months ago
- [WACV 2025] Official implementation of "Online-LoRA: Task-free Online Continual Learning via Low Rank Adaptation" by Xiwen Wei, Guihong Lβ¦β35Updated 5 months ago
- Official PyTorch Implementation for Task Vectors are Cross-Modalβ22Updated 4 months ago
- Official implementation of ECCV24 paper: POAβ24Updated 8 months ago
- PyTorch code for "ADEM-VL: Adaptive and Embedded Fusion for Efficient Vision-Language Tuning"β20Updated 5 months ago
- Official repository of "LiNeS: Post-training Layer Scaling Prevents Forgetting and Enhances Model Merging"β25Updated 5 months ago
- Official implementation of the paper "MMInA: Benchmarking Multihop Multimodal Internet Agents"β42Updated 2 months ago
- Holistic evaluation of multimodal foundation modelsβ47Updated 8 months ago
- How Good is Google Bard's Visual Understanding? An Empirical Study on Open Challengesβ30Updated last year
- β45Updated 3 months ago
- [ECCV 2024] Official Release of SILC: Improving vision language pretraining with self-distillationβ42Updated 6 months ago
- Official implementation and dataset for the NAACL 2024 paper "ComCLIP: Training-Free Compositional Image and Text Matching"β35Updated 8 months ago
- OLA-VLM: Elevating Visual Perception in Multimodal LLMs with Auxiliary Embedding Distillation, arXiv 2024β58Updated 2 months ago
- β32Updated last year
- Code for NOLA, an implementation of "nola: Compressing LoRA using Linear Combination of Random Basis"β53Updated 8 months ago
- Code for "AVG-LLaVA: A Multimodal Large Model with Adaptive Visual Granularity"β28Updated 6 months ago
- This repo contains evaluation code for the paper "BLINK: Multimodal Large Language Models Can See but Not Perceive". https://arxiv.orβ¦β121Updated 9 months ago
- Auto Interpretation Pipeline and many other functionalities for Multimodal SAE Analysis.β128Updated 3 months ago
- Matryoshka Multimodal Modelsβ99Updated 3 months ago
- [ICLR 2025] Source code for paper "A Spark of Vision-Language Intelligence: 2-Dimensional Autoregressive Transformer for Efficient Finegrβ¦β75Updated 4 months ago
- Bilingual Medical Mixture of Experts LLMβ31Updated 5 months ago
- Code repo for "Read Anywhere Pointed: Layout-aware GUI Screen Reading with Tree-of-Lens Grounding"β26Updated 8 months ago
- β88Updated last year
- Time Travel is a Comprehensive Benchmark to Evaluate LMMs on Historical and Cultural Artifactsβ18Updated last month
- Official Implementation of DiffCLIP: Differential Attention Meets CLIPβ26Updated last month
- ViLMA: A Zero-Shot Benchmark for Linguistic and Temporal Grounding in Video-Language Models (ICLR 2024, Official Implementation)β16Updated last year
- OpenVLThinker: An Early Exploration to Vision-Language Reasoning via Iterative Self-Improvementβ73Updated 3 weeks ago
- [Under Review] Official PyTorch implementation code for realizing the technical part of Phantom of Latent representing equipped with enlaβ¦β57Updated 6 months ago
- Official Pytorch implementation of "Interpreting and Editing Vision-Language Representations to Mitigate Hallucinations" (ICLR '25)β67Updated last month