FeiElysia / awesome-zero-shot-captioningView external linksLinks
A curated list of zero-shot captioning papers
☆24Aug 26, 2023Updated 2 years ago
Alternatives and similar repositories for awesome-zero-shot-captioning
Users that are interested in awesome-zero-shot-captioning are comparing it to the libraries listed below
Sorting:
- ☆11Oct 2, 2024Updated last year
- Transferable Decoding with Visual Entities for Zero-Shot Image Captioning, ICCV 2023☆162Sep 9, 2024Updated last year
- Efficient Feature Extraction for High-resolution Video Frame Interpolation (BMVC 2022)☆13Aug 24, 2023Updated 2 years ago
- COLA: Evaluate how well your vision-language model can Compose Objects Localized with Attributes!☆25Nov 23, 2024Updated last year
- ☆15Nov 30, 2023Updated 2 years ago
- PyTorch implementation of "UNIT: Unifying Image and Text Recognition in One Vision Encoder", NeurlPS 2024.☆34Sep 26, 2024Updated last year
- ☆15May 23, 2022Updated 3 years ago
- ☆16May 23, 2023Updated 2 years ago
- CLAIR: A (surprisingly) simple semantic text metric with large language models.☆21Jan 28, 2024Updated 2 years ago
- ☆14Oct 31, 2022Updated 3 years ago
- [ICCV2023] Tem-adapter: Adapting Image-Text Pretraining for Video Question Answer☆37Oct 18, 2023Updated 2 years ago
- This repository is related to 'Intriguing Properties of Hyperbolic Embeddings in Vision-Language Models', published at TMLR (2024), https…☆22Jul 5, 2024Updated last year
- [ICCV 2023] With a Little Help from your own Past: Prototypical Memory Networks for Image Captioning.☆19Jun 7, 2024Updated last year
- [CVPR 2024] Retrieval-Augmented Image Captioning with External Visual-Name Memory for Open-World Comprehension☆60Apr 8, 2024Updated last year
- [ECCV'22 Poster] Explicit Image Caption Editing☆22Nov 30, 2022Updated 3 years ago
- Code for "CAFe: Unifying Representation and Generation with Contrastive-Autoregressive Finetuning"☆32Mar 26, 2025Updated 10 months ago
- A curated list of papers, datasets and resources pertaining to zero-shot object detection.☆29Mar 15, 2023Updated 2 years ago
- Some papers about *diverse* image (a few videos) captioning☆26Apr 4, 2023Updated 2 years ago
- Papers about Explainable AI (Deep Learning-based)☆29Nov 14, 2025Updated 3 months ago
- Distilling Large Vision-Language Model with Out-of-Distribution Generalizability (ICCV 2023)☆60Apr 8, 2024Updated last year
- Towards a Unified View on Visual Parameter-Efficient Transfer Learning☆26Oct 13, 2022Updated 3 years ago
- Flutter repository based on tflite model for image recognition☆30Apr 1, 2022Updated 3 years ago
- Code and data for the paper "Emergent Visual-Semantic Hierarchies in Image-Text Representations" (ECCV 2024)☆33Aug 12, 2024Updated last year
- A curated list of Multimodal Captioning related research(including image captioning, video captioning, and text captioning)☆112Jun 6, 2022Updated 3 years ago
- TemporalBench: Benchmarking Fine-grained Temporal Understanding for Multimodal Video Models☆37Nov 10, 2024Updated last year
- Deep Multimodal Multilinear Fusion with High-order Polynomial Pooling☆26Oct 26, 2019Updated 6 years ago
- (NeurIPS 2024 Spotlight) TOPA: Extend Large Language Models for Video Understanding via Text-Only Pre-Alignment☆29Sep 27, 2024Updated last year
- [ACL 2025] The official pytorch implement of "MIND: A Multi-agent Framework for Zero-shot Harmful Meme Detection".☆26May 26, 2025Updated 8 months ago
- A length-controllable and non-autoregressive image captioning model.☆69Jun 10, 2021Updated 4 years ago
- GRIT: Faster and Better Image-captioning Transformer (ECCV 2022)☆198May 9, 2023Updated 2 years ago
- LaFTer: Label-Free Tuning of Zero-shot Classifier using Language and Unlabeled Image Collections (NeurIPS 2023)☆29Dec 27, 2023Updated 2 years ago
- (ACL'2023) MultiCapCLIP: Auto-Encoding Prompts for Zero-Shot Multilingual Visual Captioning☆36Aug 8, 2024Updated last year
- Official implementation of "ConZIC: Controllable Zero-shot Image Captioning by Sampling-Based Polishing"☆75Sep 20, 2023Updated 2 years ago
- Some resources (books, paper, video and online courses) about ML,DL,DM☆12Mar 14, 2021Updated 4 years ago
- [ECCV 2024] ControlCap: Controllable Region-level Captioning☆80Oct 25, 2024Updated last year
- [CVPR 2024] Leveraging Vision-Language Models for Improving Domain Generalization in Image Classification☆39Mar 6, 2024Updated last year
- [ECCV 2024] Learning Video Context as Interleaved Multimodal Sequences☆42Mar 11, 2025Updated 11 months ago
- ☆31Oct 25, 2021Updated 4 years ago
- These are the official datasets used on the Medicare.gov Hospital Compare Website provided by the Centers for Medicare & Medicaid Service…☆10Mar 12, 2018Updated 7 years ago