joanrod / figure-diffusion
Generating figures from research papers, using textual captions from the paper.
☆14Updated last year
Related projects: ⓘ
- Interpretable Diffusion Via Information Decomposition☆20Updated 2 months ago
- This repository includes the official implementation of our paper "Sight Beyond Text: Multi-Modal Training Enhances LLMs in Truthfulness …☆19Updated last year
- Implementation and dataset for paper "Can MLLMs Perform Text-to-Image In-Context Learning?"☆22Updated last month
- Code for Commonsense-T2I Challenge: Can Text-to-Image Generation Models Understand Commonsense? [COLM 2024]☆11Updated last month
- ☆30Updated 11 months ago
- Visual Instruction-guided Explainable Metric. Code for "Towards Explainable Metrics for Conditional Image Synthesis Evaluation" (ACL 2024…☆22Updated last month
- Official implementation of our paper "Finetuned Multimodal Language Models are High-Quality Image-Text Data Filters".☆40Updated 2 months ago
- ☆30Updated 7 months ago
- Official implementation of the paper The Hidden Language of Diffusion Models☆66Updated 7 months ago
- Enhancing Large Vision Language Models with Self-Training on Image Comprehension.☆51Updated 3 months ago
- VPEval Codebase from Visual Programming for Text-to-Image Generation and Evaluation (NeurIPS 2023)☆42Updated 9 months ago
- Open source community's implementation of the model from "LANGUAGE MODEL BEATS DIFFUSION — TOKENIZER IS KEY TO VISUAL GENERATION"☆15Updated last week
- Official code for ICLR 2024 paper Do Generated Data Always Help Contrastive Learning?☆25Updated 5 months ago
- ☆14Updated 7 months ago
- Official implementation and dataset for the NAACL 2024 paper "ComCLIP: Training-Free Compositional Image and Text Matching"☆30Updated last month
- Official code repo for "Editing Implicit Assumptions in Text-to-Image Diffusion Models"☆81Updated last year
- ☆46Updated 10 months ago
- Implementation of Foundation Model is Efficient Multimodal Multitask Model Selector☆33Updated 6 months ago
- visual question answering prompting recipes for large vision-language models☆18Updated last week
- Code and datasets for "What’s “up” with vision-language models? Investigating their struggle with spatial reasoning".☆32Updated 6 months ago
- [IJCAI'23] The official Github page of the paper "Diffusion Models for Non-autoregressive Text Generation: A Survey".☆20Updated 9 months ago
- ☆26Updated 7 months ago
- Official repository for CoMM Dataset☆16Updated this week
- ☆53Updated 5 months ago
- ☆17Updated 4 months ago
- This is the implementation of CounterCurate, the data curation pipeline of both physical and semantic counterfactual image-caption pairs.☆16Updated 2 months ago
- Repo for the paper `ControlMLLM: Training-Free Visual Prompt Learning for Multimodal Large Language Models'☆44Updated 3 weeks ago
- ☆53Updated 11 months ago
- (arXiv.2405.18406) RACCooN: Remove, Add, and Change Video Content with Auto-Generated Narratives☆26Updated 3 months ago
- [ICML 2024] On Discrete Prompt Optimization for Diffusion Models - Google☆20Updated last month