A curated list of zero-shot captioning papers
☆24Aug 26, 2023Updated 2 years ago
Alternatives and similar repositories for awesome-zero-shot-captioning
Users that are interested in awesome-zero-shot-captioning are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- ☆11Oct 2, 2024Updated last year
- Transferable Decoding with Visual Entities for Zero-Shot Image Captioning, ICCV 2023☆164Sep 9, 2024Updated last year
- COLA: Evaluate how well your vision-language model can Compose Objects Localized with Attributes!☆25May 14, 2026Updated 2 weeks ago
- ☆17May 23, 2023Updated 3 years ago
- 我在校园的各项API,自动运行脚本,支持多人☆12Jun 28, 2022Updated 3 years ago
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- [ICCV2023] Tem-adapter: Adapting Image-Text Pretraining for Video Question Answer☆37Oct 18, 2023Updated 2 years ago
- CLAIR: A (surprisingly) simple semantic text metric with large language models.☆22Jan 28, 2024Updated 2 years ago
- Efficient Feature Extraction for High-resolution Video Frame Interpolation (BMVC 2022)☆14Aug 24, 2023Updated 2 years ago
- Distilling Large Vision-Language Model with Out-of-Distribution Generalizability (ICCV 2023)☆60Apr 8, 2024Updated 2 years ago
- ☆16Nov 30, 2023Updated 2 years ago
- The complete codes of the paper "Multimodal Graph Contrastive Learning for Recommendation"☆10Mar 20, 2023Updated 3 years ago
- A curated list of Multimodal Captioning related research(including image captioning, video captioning, and text captioning)☆112Jun 6, 2022Updated 3 years ago
- WWW'24, Mirror Gradient (MG) makes multimodal recommendation models approach flat local minima easier compared to models with normal trai…☆17Nov 1, 2024Updated last year
- [CVPR 2024] Retrieval-Augmented Image Captioning with External Visual-Name Memory for Open-World Comprehension☆63Apr 8, 2024Updated 2 years ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- [SIGIR'25] Code of "Generative Recommender with End-to-End Learnable Item Tokenization".☆32Apr 17, 2025Updated last year
- ☆14May 23, 2022Updated 4 years ago
- ☆13Jun 2, 2023Updated 2 years ago
- Developer project for getting basic API integrations working in under 5 minutes☆11May 22, 2026Updated last week
- Some papers about *diverse* image (a few videos) captioning☆26Apr 4, 2023Updated 3 years ago
- [WSDM 2025] Source code for "Teach Me How to Denoise: A Universal Framework for Denoising Multi-modal Recommender Systems via Guided Cali…☆14Oct 14, 2025Updated 7 months ago
- Accelerating Vision-Language Pretraining with Free Language Modeling (CVPR 2023)☆33May 15, 2023Updated 3 years ago
- [CVPR 2026] HiconAgent: History Context-aware Policy Optimization for GUI Agents☆29Mar 9, 2026Updated 2 months ago
- Python codes for mathematical modeling.☆13Sep 5, 2021Updated 4 years ago
- Virtual machines for every use case on DigitalOcean • AdGet dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
- ☆12Aug 8, 2024Updated last year
- Support finetuning GLM4v with zero2☆16Jun 29, 2024Updated last year
- Papers about Explainable AI (Deep Learning-based)☆29Nov 14, 2025Updated 6 months ago
- This repository is related to 'Intriguing Properties of Hyperbolic Embeddings in Vision-Language Models', published at TMLR (2024), https…☆22Jul 5, 2024Updated last year
- Flutter repository based on tflite model for image recognition☆30Apr 1, 2022Updated 4 years ago
- ☆14Oct 31, 2022Updated 3 years ago
- Official code for "pi-Tuning: Transferring Multimodal Foundation Models with Optimal Multi-task Interpolation", ICML 2023.☆34Jul 21, 2023Updated 2 years ago
- text-only training or language-free training for multimodal tasks (image/audio/video caption, retrieval, text2image)☆12Oct 15, 2024Updated last year
- A curated list of researches in object-centric learning☆11Oct 14, 2024Updated last year
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- AgentProg: Empowering Long-Horizon GUI Agents with Program-Guided Context Management☆31Apr 10, 2026Updated last month
- Brain tumor images classification with ResNet, EfficientNet, EfficientNet_V2 and Compact Convolutional Transformers architectures with Py…☆11Jan 5, 2023Updated 3 years ago
- [CVPR 2023 & IJCV 2025] Positive-Augmented Contrastive Learning for Image and Video Captioning Evaluation☆65Jul 29, 2025Updated 10 months ago
- [CVPR 2024] Leveraging Vision-Language Models for Improving Domain Generalization in Image Classification☆42Mar 6, 2024Updated 2 years ago
- [ICLR 2026] Official repo for "Spotlight on Token Perception for Multimodal Reinforcement Learning"☆64Apr 3, 2026Updated last month
- ☆23Mar 16, 2026Updated 2 months ago
- FeelingBlue: A Corpus for Understanding the Emotional Connotation of Color in Context, accepted at TACL 2022, presented at ACL 2023☆13Dec 28, 2023Updated 2 years ago