pliang279 / HEMM
Holistic evaluation of multimodal foundation models
☆41Updated 3 months ago
Related projects ⓘ
Alternatives and complementary repositories for HEMM
- Code release for "SPIQA: A Dataset for Multimodal Question Answering on Scientific Papers"☆40Updated last month
- [Arxiv] Aligning Modalities in Vision Large Language Models via Preference Fine-tuning☆73Updated 6 months ago
- Code and datasets for "What’s “up” with vision-language models? Investigating their struggle with spatial reasoning".☆34Updated 8 months ago
- Official Pytorch implementation of "Interpreting and Editing Vision-Language Representations to Mitigate Hallucinations"☆34Updated this week
- Enhancing Large Vision Language Models with Self-Training on Image Comprehension.☆59Updated 5 months ago
- Unsolvable Problem Detection: Evaluating Trustworthiness of Vision Language Models☆70Updated 2 months ago
- [Under Review] Official PyTorch implementation code for realizing the technical part of Phantom of Latent representing equipped with enla…☆45Updated last month
- Official implementation of "Connect, Collapse, Corrupt: Learning Cross-Modal Tasks with Uni-Modal Data" (ICLR 2024)☆28Updated last month
- ☆15Updated 2 weeks ago
- This repo contains evaluation code for the paper "BLINK: Multimodal Large Language Models Can See but Not Perceive". https://arxiv.or…☆107Updated 4 months ago
- Code for paper "Unraveling Cross-Modality Knowledge Conflicts in Large Vision-Language Models."☆33Updated last month
- [NeurIPS 2024] Calibrated Self-Rewarding Vision Language Models☆45Updated 5 months ago
- Official implementation of MAIA, A Multimodal Automated Interpretability Agent☆62Updated 3 months ago
- Language Quantized AutoEncoders☆94Updated last year
- [NAACL 2024] Vision language model that reduces hallucinations through self-feedback guided revision. Visualizes attentions on image feat…☆43Updated 3 months ago
- ☆47Updated 4 months ago
- Video-STaR: Self-Training Enables Video Instruction Tuning with Any Supervision☆47Updated 4 months ago
- PyTorch codes for the paper "An Empirical Study of Multimodal Model Merging"☆36Updated last year
- Official Code Release for "Diagnosing and Rectifying Vision Models using Language" (ICLR 2023)☆32Updated last year
- Official implementation of our paper "Finetuned Multimodal Language Models are High-Quality Image-Text Data Filters".☆42Updated 3 weeks ago
- Official pytorch implementation of "Interpreting the Second-Order Effects of Neurons in CLIP"☆28Updated this week
- Official implementation of Hierarchical Context Merging: Better Long Context Understanding for Pre-trained LLMs (ICLR 2024).☆32Updated 3 months ago
- ☆19Updated last month
- Source code for paper "A Spark of Vision-Language Intelligence: 2-Dimensional Autoregressive Transformer for Efficient Finegrained Image …☆55Updated last month
- [TMLR] Public code repo for paper "A Single Transformer for Scalable Vision-Language Modeling"☆115Updated last week
- visual question answering prompting recipes for large vision-language models☆22Updated 2 months ago
- Code accompanying the paper "Massive Activations in Large Language Models"☆123Updated 8 months ago
- [ACL 2024 Findings & ICLR 2024 WS] An Evaluator VLM that is open-source, offers reproducible evaluation, and inexpensive to use. Specific…☆57Updated 2 months ago
- Public code repo for EMNLP 2024 Findings paper "MACAROON: Training Vision-Language Models To Be Your Engaged Partners"☆12Updated last month
- ☆38Updated 3 months ago