xu-shitong / diffusion-image-captioning
implementation of paper https://arxiv.org/abs/2210.04559
☆54Updated 2 years ago
Related projects ⓘ
Alternatives and complementary repositories for diffusion-image-captioning
- ☆85Updated last year
- [CVPR23] A cascaded diffusion captioning model with a novel semantic-conditional diffusion process that upgrades conventional diffusion m…☆56Updated 5 months ago
- ☆55Updated last year
- Controllable mage captioning model with unsupervised modes☆21Updated last year
- Official implementation of "ConZIC: Controllable Zero-shot Image Captioning by Sampling-Based Polishing"☆73Updated last year
- Source code for EMNLP 2022 paper “PEVL: Position-enhanced Pre-training and Prompt Tuning for Vision-language Models”☆47Updated 2 years ago
- [ICCV 2021] Official implementation of the paper "TRAR: Routing the Attention Spans in Transformers for Visual Question Answering"☆65Updated 3 years ago
- ☆60Updated last year
- The SVO-Probes Dataset for Verb Understanding☆31Updated 2 years ago
- This repo contains codes and instructions for baselines in the VLUE benchmark.☆41Updated 2 years ago
- [ECCV'22 Poster] Explicit Image Caption Editing☆21Updated last year
- official repository for DiffCap: Exploring Continuous Diffusion on Image Captioning☆6Updated last year
- ☆26Updated 2 years ago
- Repository for the paper: dense and aligned captions (dac) promote compositional reasoning in vl models☆25Updated 11 months ago
- Offical PyTorch implementation of Clover: Towards A Unified Video-Language Alignment and Fusion Model (CVPR2023)☆40Updated last year
- ☆63Updated 5 years ago
- Benchmark data for "Rethinking Benchmarks for Cross-modal Image-text Retrieval" (SIGIR 2023)☆22Updated last year
- Colorful Prompt Tuning for Pre-trained Vision-Language Models☆47Updated 2 years ago
- ViLLA: Fine-grained vision-language representation learning from real-world data☆39Updated last year
- 📍 Official pytorch implementation of paper "ProtoCLIP: Prototypical Contrastive Language Image Pretraining" (IEEE TNNLS)☆48Updated last year
- Official repository for the A-OKVQA dataset☆63Updated 6 months ago
- Official implementation for the MM'22 paper.☆11Updated 2 years ago
- [ICLR2024] The official implementation of paper "UniAdapter: Unified Parameter-Efficient Transfer Learning for Cross-modal Modeling", by …☆69Updated 9 months ago
- Human-like Controllable Image Captioning with Verb-specific Semantic Roles.☆36Updated 2 years ago
- Mind the Gap: Understanding the Modality Gap in Multi-modal Contrastive Representation Learning☆123Updated 2 years ago
- Official Repository for CVPR 2022 paper "REX: Reasoning-aware and Grounded Explanation"☆18Updated 11 months ago
- Implementation for CVPR 2022 paper " Injecting Semantic Concepts into End-to-End Image Captionin".☆41Updated 2 years ago
- (ACL'2023) MultiCapCLIP: Auto-Encoding Prompts for Zero-Shot Multilingual Visual Captioning☆35Updated 3 months ago
- Implementation of our IJCAI2022 oral paper, ER-SAN: Enhanced-Adaptive Relation Self-Attention Network for Image Captioning.☆20Updated last year
- ☆25Updated 3 years ago