simran-khanuja / image-transcreationLinks
☆23Updated 5 months ago
Alternatives and similar repositories for image-transcreation
Users that are interested in image-transcreation are comparing it to the libraries listed below
Sorting:
- The Conceptual Coverage Across Languages Benchmark for Text-to-Image Models☆12Updated 10 months ago
- Repository for the paper: dense and aligned captions (dac) promote compositional reasoning in vl models☆27Updated last year
- [CVPR 2025 🔥] ALM-Bench is a multilingual multi-modal diverse cultural benchmark for 100 languages across 19 categories. It assesses the…☆46Updated 3 months ago
- This is the official implementation of the paper "MM-SHAP: A Performance-agnostic Metric for Measuring Multimodal Contributions in Vision…☆31Updated last year
- [NeurIPS 2025] MergeBench: A Benchmark for Merging Domain-Specialized LLMs☆22Updated 4 months ago
- This repository contains the code and data for the paper "VisOnlyQA: Large Vision Language Models Still Struggle with Visual Perception o…☆27Updated 2 months ago
- Public code repo for EMNLP 2024 Findings paper "MACAROON: Training Vision-Language Models To Be Your Engaged Partners"☆14Updated 11 months ago
- [NAACL 2024] Vision language model that reduces hallucinations through self-feedback guided revision. Visualizes attentions on image feat…☆46Updated last year
- Code for paper "Unraveling Cross-Modality Knowledge Conflicts in Large Vision-Language Models."☆46Updated 11 months ago
- Corpus to accompany: "Do Androids Laugh at Electric Sheep? Humor "Understanding" Benchmarks from The New Yorker Caption Contest"☆57Updated 6 months ago
- Sparkles: Unlocking Chats Across Multiple Images for Multimodal Instruction-Following Models☆44Updated last year
- ☆29Updated 2 years ago
- Official repository for the MMFM challenge☆25Updated last year
- ☆130Updated 3 years ago
- Data repository for the VALSE benchmark.☆37Updated last year
- Do Vision and Language Models Share Concepts? A Vector Space Alignment Study☆16Updated 10 months ago
- Code for paper "Diffusion Language Models Can Perform Many Tasks with Scaling and Instruction-Finetuning"☆83Updated last year
- ViLMA: A Zero-Shot Benchmark for Linguistic and Temporal Grounding in Video-Language Models (ICLR 2024, Official Implementation)☆16Updated last year
- [NeurIPS 2024 Spotlight] Code and data for the paper "Finding Transformer Circuits with Edge Pruning".☆60Updated last month
- This repository is maintained to release dataset and models for multimodal puzzle reasoning.☆102Updated 6 months ago
- (WACV 2025 - Oral) Vision-language conversation in 10 languages including English, Chinese, French, Spanish, Russian, Japanese, Arabic, H…☆83Updated last month
- About The corresponding code from our paper " Making Reasoning Matter: Measuring and Improving Faithfulness of Chain-of-Thought Reasoning…☆12Updated last year
- MaXM is a suite of test-only benchmarks for multilingual visual question answering in 7 languages: English (en), French (fr), Hindi (hi),…☆13Updated last year
- This repo contains evaluation code for the paper "BLINK: Multimodal Large Language Models Can See but Not Perceive". https://arxiv.or…☆140Updated last year
- [NeurIPS 2024 Main Track] Code for the paper titled "Instruction Tuning With Loss Over Instructions"☆39Updated last year
- [ICLR 2024] This is the repository for the paper titled "DePT: Decomposed Prompt Tuning for Parameter-Efficient Fine-tuning"☆96Updated last year
- Repository for Multilingual-VQA task created during HuggingFace JAX/Flax community week.☆34Updated 4 years ago
- Code for 'Why is Winoground Hard? Investigating Failures in Visuolinguistic Compositionality', EMNLP 2022☆31Updated 2 years ago
- Official implementation of our paper "Finetuned Multimodal Language Models are High-Quality Image-Text Data Filters".☆67Updated 5 months ago
- [ACL 2024 Findings & ICLR 2024 WS] An Evaluator VLM that is open-source, offers reproducible evaluation, and inexpensive to use. Specific…☆77Updated last year