simran-khanuja / image-transcreationLinks
☆24Updated 8 months ago
Alternatives and similar repositories for image-transcreation
Users that are interested in image-transcreation are comparing it to the libraries listed below
Sorting:
- The Conceptual Coverage Across Languages Benchmark for Text-to-Image Models☆12Updated last year
- Repository for the paper: dense and aligned captions (dac) promote compositional reasoning in vl models☆27Updated 2 years ago
- About The corresponding code from our paper " Making Reasoning Matter: Measuring and Improving Faithfulness of Chain-of-Thought Reasoning…☆12Updated last year
- [NeurIPS 2025] MergeBench: A Benchmark for Merging Domain-Specialized LLMs☆37Updated last week
- [CVPR 2025 🔥] ALM-Bench is a multilingual multi-modal diverse cultural benchmark for 100 languages across 19 categories. It assesses the…☆45Updated 6 months ago
- ☆130Updated 3 years ago
- This is the official implementation of the paper "MM-SHAP: A Performance-agnostic Metric for Measuring Multimodal Contributions in Vision…☆31Updated last year
- [ACL 2024 Findings & ICLR 2024 WS] An Evaluator VLM that is open-source, offers reproducible evaluation, and inexpensive to use. Specific…☆78Updated last year
- Retrieval-augmented Image Captioning☆13Updated 2 years ago
- Code for paper "Unraveling Cross-Modality Knowledge Conflicts in Large Vision-Language Models."☆50Updated last year
- Code repository for the paper "Mission: Impossible Language Models."☆56Updated 2 months ago
- This repository is maintained to release dataset and models for multimodal puzzle reasoning.☆113Updated 9 months ago
- [ICLR '25] Official Pytorch implementation of "Interpreting and Editing Vision-Language Representations to Mitigate Hallucinations"☆96Updated 3 weeks ago
- ☆41Updated last year
- [ICLR 2024] This is the repository for the paper titled "DePT: Decomposed Prompt Tuning for Parameter-Efficient Fine-tuning"☆99Updated last year
- ViLMA: A Zero-Shot Benchmark for Linguistic and Temporal Grounding in Video-Language Models (ICLR 2024, Official Implementation)☆16Updated last year
- [ICCV 2025] Auto Interpretation Pipeline and many other functionalities for Multimodal SAE Analysis.☆169Updated 2 months ago
- [NAACL 2024] Vision language model that reduces hallucinations through self-feedback guided revision. Visualizes attentions on image feat…☆47Updated last year
- Corpus to accompany: "Do Androids Laugh at Electric Sheep? Humor "Understanding" Benchmarks from The New Yorker Caption Contest"☆58Updated 9 months ago
- (WACV 2025 - Oral) Vision-language conversation in 10 languages including English, Chinese, French, Spanish, Russian, Japanese, Arabic, H…☆84Updated 4 months ago
- Data repository for the VALSE benchmark.☆37Updated last year
- BLEnD: A Benchmark for LLMs on Everyday Knowledge in Diverse Cultures and Languages☆42Updated 4 months ago
- Sparkles: Unlocking Chats Across Multiple Images for Multimodal Instruction-Following Models☆44Updated last year
- Code for 'Why is Winoground Hard? Investigating Failures in Visuolinguistic Compositionality', EMNLP 2022☆31Updated 2 years ago
- ☆29Updated 3 years ago
- ☆24Updated 2 years ago
- Code for Zero-Shot Tokenizer Transfer☆142Updated 11 months ago
- NLX-GPT: A Model for Natural Language Explanations in Vision and Vision-Language Tasks, CVPR 2022 (Oral)☆49Updated last year
- Holistic evaluation of multimodal foundation models☆47Updated last year
- Repository for Multilingual-VQA task created during HuggingFace JAX/Flax community week.☆34Updated 4 years ago