Yushi-Hu / PromptCap
natual language guided image captioning
☆82Updated last year
Alternatives and similar repositories for PromptCap:
Users that are interested in PromptCap are comparing it to the libraries listed below
- NExT-QA: Next Phase of Question-Answering to Explaining Temporal Actions (CVPR'21)☆151Updated 9 months ago
- Source code for EMNLP 2022 paper “PEVL: Position-enhanced Pre-training and Prompt Tuning for Vision-language Models”☆48Updated 2 years ago
- ☆37Updated last year
- ICCV 2023 (Oral) Open-domain Visual Entity Recognition Towards Recognizing Millions of Wikipedia Entities☆40Updated 8 months ago
- MultiInstruct: Improving Multi-Modal Zero-Shot Learning via Instruction Tuning☆135Updated last year
- NExT-QA: Next Phase of Question-Answering to Explaining Temporal Actions (CVPR'21)☆28Updated last year
- Official PyTorch implementation of our CVPR 2022 paper: Beyond a Pre-Trained Object Detector: Cross-Modal Textual and Visual Context for …☆60Updated 2 years ago
- Official repository for the A-OKVQA dataset☆84Updated last year
- MuKEA: Multimodal Knowledge Extraction and Accumulation for Knowledge-based Visual Question Answering☆94Updated 2 years ago
- SimVLM ---SIMPLE VISUAL LANGUAGE MODEL PRETRAINING WITH WEAK SUPERVISION☆36Updated 2 years ago
- [ICCV 2021] Official implementation of the paper "TRAR: Routing the Attention Spans in Transformers for Visual Question Answering"☆66Updated 3 years ago
- ☆46Updated last year
- Code for our EMNLP-2022 paper: "Towards Robust Visual Question Answering: Making the Most of Biased Samples via Contrastive Learning"☆15Updated 2 years ago
- UniTAB: Unifying Text and Box Outputs for Grounded VL Modeling, ECCV 2022 (Oral Presentation)☆86Updated last year
- (ACL'2023) MultiCapCLIP: Auto-Encoding Prompts for Zero-Shot Multilingual Visual Captioning☆35Updated 9 months ago
- MixGen: A New Multi-Modal Data Augmentation☆122Updated 2 years ago
- Pytorch code for Language Models with Image Descriptors are Strong Few-Shot Video-Language Learners☆115Updated 2 years ago
- [CVPR 2024] Retrieval-Augmented Image Captioning with External Visual-Name Memory for Open-World Comprehension☆49Updated last year
- Official implementation of "ConZIC: Controllable Zero-shot Image Captioning by Sampling-Based Polishing"☆73Updated last year
- ☆83Updated 3 years ago
- SmallCap: Lightweight Image Captioning Prompted with Retrieval Augmentation☆107Updated last year
- Official Code for the ICCV23 Paper: "LexLIP: Lexicon-Bottlenecked Language-Image Pre-Training for Large-Scale Image-Text Sparse Retrieval…☆41Updated last year
- GRIT: Faster and Better Image-captioning Transformer (ECCV 2022)☆190Updated 2 years ago
- ☆68Updated 6 years ago
- [ACL 2023] Official PyTorch code for Singularity model in "Revealing Single Frame Bias for Video-and-Language Learning"☆134Updated 2 years ago
- A lightweight codebase for referring expression comprehension and segmentation☆54Updated 2 years ago
- 🦩 Visual Instruction Tuning with Polite Flamingo - training multi-modal LLMs to be both clever and polite! (AAAI-24 Oral)☆64Updated last year
- Cross-View Language Modeling: Towards Unified Cross-Lingual Cross-Modal Pre-training (ACL 2023))☆90Updated last year
- Implementation for CVPR 2022 paper " Injecting Semantic Concepts into End-to-End Image Captionin".☆43Updated 2 years ago
- Scanning Only Once: An End-to-end Framework for Fast Temporal Grounding in Long Videos☆23Updated 10 months ago