A curated list of zero-shot captioning papers
☆24Aug 26, 2023Updated 2 years ago
Alternatives and similar repositories for awesome-zero-shot-captioning
Users that are interested in awesome-zero-shot-captioning are comparing it to the libraries listed below
Sorting:
- ☆11Oct 2, 2024Updated last year
- Transferable Decoding with Visual Entities for Zero-Shot Image Captioning, ICCV 2023☆162Sep 9, 2024Updated last year
- Efficient Feature Extraction for High-resolution Video Frame Interpolation (BMVC 2022)☆13Aug 24, 2023Updated 2 years ago
- COLA: Evaluate how well your vision-language model can Compose Objects Localized with Attributes!☆25Nov 23, 2024Updated last year
- (NeurIPS 2019) Combinatorial Inference against Label Noise☆11Jun 13, 2024Updated last year
- PyTorch implementation of "UNIT: Unifying Image and Text Recognition in One Vision Encoder", NeurlPS 2024.☆34Sep 26, 2024Updated last year
- ☆15May 23, 2022Updated 3 years ago
- ☆14Oct 31, 2022Updated 3 years ago
- ☆16May 23, 2023Updated 2 years ago
- CLAIR: A (surprisingly) simple semantic text metric with large language models.☆22Jan 28, 2024Updated 2 years ago
- [ICCV2023] Tem-adapter: Adapting Image-Text Pretraining for Video Question Answer☆37Oct 18, 2023Updated 2 years ago
- This repository is related to 'Intriguing Properties of Hyperbolic Embeddings in Vision-Language Models', published at TMLR (2024), https…☆22Jul 5, 2024Updated last year
- [ICCV 2023] With a Little Help from your own Past: Prototypical Memory Networks for Image Captioning.☆19Jun 7, 2024Updated last year
- [CVPR 2024] Retrieval-Augmented Image Captioning with External Visual-Name Memory for Open-World Comprehension☆62Apr 8, 2024Updated last year
- [ECCV'22 Poster] Explicit Image Caption Editing☆22Nov 30, 2022Updated 3 years ago
- Code for "CAFe: Unifying Representation and Generation with Contrastive-Autoregressive Finetuning"☆32Mar 26, 2025Updated 11 months ago
- [CVPR 2023 & IJCV 2025] Positive-Augmented Contrastive Learning for Image and Video Captioning Evaluation☆65Jul 29, 2025Updated 7 months ago
- Distilling Large Vision-Language Model with Out-of-Distribution Generalizability (ICCV 2023)☆61Apr 8, 2024Updated last year
- Towards a Unified View on Visual Parameter-Efficient Transfer Learning☆26Oct 13, 2022Updated 3 years ago
- Awesome Vision-Language Compositionality, a comprehensive curation of research papers in literature.☆35Feb 13, 2025Updated last year
- Official implementation and dataset for the NAACL 2024 paper "ComCLIP: Training-Free Compositional Image and Text Matching"☆37Aug 18, 2024Updated last year
- Official code for Zero-shot Referring Expression Comprehension via Structural Similarity Between Images and Captions (CVPR 2024)☆28Jun 21, 2024Updated last year
- Flutter repository based on tflite model for image recognition☆30Apr 1, 2022Updated 3 years ago
- Code and data for the paper "Emergent Visual-Semantic Hierarchies in Image-Text Representations" (ECCV 2024)☆34Aug 12, 2024Updated last year
- A curated list of Multimodal Captioning related research(including image captioning, video captioning, and text captioning)☆113Jun 6, 2022Updated 3 years ago
- Repository for the paper: dense and aligned captions (dac) promote compositional reasoning in vl models☆27Nov 29, 2023Updated 2 years ago
- TemporalBench: Benchmarking Fine-grained Temporal Understanding for Multimodal Video Models☆37Nov 10, 2024Updated last year
- Accelerating Vision-Language Pretraining with Free Language Modeling (CVPR 2023)☆32May 15, 2023Updated 2 years ago
- (NeurIPS 2024 Spotlight) TOPA: Extend Large Language Models for Video Understanding via Text-Only Pre-Alignment☆29Sep 27, 2024Updated last year
- [ACL 2025] The official pytorch implement of "MIND: A Multi-agent Framework for Zero-shot Harmful Meme Detection".☆25May 26, 2025Updated 9 months ago
- Official code for "pi-Tuning: Transferring Multimodal Foundation Models with Optimal Multi-task Interpolation", ICML 2023.☆33Jul 21, 2023Updated 2 years ago
- Codes for ReFocus: Visual Editing as a Chain of Thought for Structured Image Understanding [ICML 2025]]☆45Jul 22, 2025Updated 7 months ago
- A length-controllable and non-autoregressive image captioning model.☆69Jun 10, 2021Updated 4 years ago
- GRIT: Faster and Better Image-captioning Transformer (ECCV 2022)☆198May 9, 2023Updated 2 years ago
- LaFTer: Label-Free Tuning of Zero-shot Classifier using Language and Unlabeled Image Collections (NeurIPS 2023)☆29Dec 27, 2023Updated 2 years ago
- (ACL'2023) MultiCapCLIP: Auto-Encoding Prompts for Zero-Shot Multilingual Visual Captioning☆36Aug 8, 2024Updated last year
- Official implementation of "ConZIC: Controllable Zero-shot Image Captioning by Sampling-Based Polishing"☆75Sep 20, 2023Updated 2 years ago
- [ECCV 2024] ControlCap: Controllable Region-level Captioning☆80Oct 25, 2024Updated last year
- Some resources (books, paper, video and online courses) about ML,DL,DM☆12Mar 14, 2021Updated 4 years ago