FeiElysia / ViECap
Transferable Decoding with Visual Entities for Zero-Shot Image Captioning, ICCV 2023
☆147Updated 2 months ago
Related projects ⓘ
Alternatives and complementary repositories for ViECap
- Progressive Spatio-Temporal Prototype Matching for Text-Video Retrieval --ICCV2023 Oral☆90Updated last year
- MomentDiff: Generative Video Moment Retrieval from Random to Real--NeurIPS 2023☆75Updated last year
- Accepted by ICCV2023, Revisiting Foreground and Background Separation in Weakly-supervised Temporal Action Localization: A Clustering-bas…☆103Updated 6 months ago
- [IEEE T-PAMI 2023] Cross-Modal Causal Relational Reasoning for Event-Level Visual Question Answering☆72Updated last year
- Noise of Web (NoW) is a challenging noisy correspondence learning (NCL) benchmark containing 100K image-text pairs for robust image-text …☆12Updated 2 months ago
- [NeurIPS 2022 Spotlight] Expectation-Maximization Contrastive Learning for Compact Video-and-Language Representations☆121Updated 7 months ago
- Balanced Classification: A Unified Framework for Long-Tailed Object Detection (TMM 2023)☆95Updated last year
- ☆84Updated last year
- [IEEE T-PAMI 2023] Cross-Modal Causal Relational Reasoning for Event-Level Visual Question Answering☆15Updated last year
- ☆86Updated 4 months ago
- [IJCAI 2023] Text-Video Retrieval with Disentangled Conceptualization and Set-to-Set Alignment☆48Updated 7 months ago
- Official implementation of "ConZIC: Controllable Zero-shot Image Captioning by Sampling-Based Polishing"☆73Updated last year
- [CVPR 2023 Highlight] Video-Text as Game Players: Hierarchical Banzhaf Interaction for Cross-Modal Representation Learning☆109Updated 7 months ago
- [CVPR 2024] Context-Guided Spatio-Temporal Video Grounding☆42Updated 4 months ago
- Pytorch Code for "Unified Coarse-to-Fine Alignment for Video-Text Retrieval" (ICCV 2023)☆61Updated 5 months ago
- ☆89Updated last year
- 【ICLR 2024, Spotlight】Sentence-level Prompts Benefit Composed Image Retrieval☆68Updated 7 months ago
- Source code of our CVPR2024 paper TeachCLIP for Text-to-Video Retrieval☆22Updated 3 weeks ago
- Code release for Your “On-the-fly Category Discovery (CVPR 2023)”☆51Updated last year
- [ICCV 2023] DiffusionRet: Generative Text-Video Retrieval with Diffusion Model☆120Updated 7 months ago
- [CVPR 2024] SimDA: Simple Diffusion Adapter for Efficient Video Generation☆118Updated 6 months ago
- Context-I2W: Mapping Images to Context-dependent words for Accurate Zero-Shot Composed Image Retrieval [AAAI 2024 Oral]☆39Updated 7 months ago
- [ACM MM 2021 Oral] Official repo of "Neighbor-view Enhanced Model for Vision and Language Navigation"☆79Updated 2 years ago
- The official code of Towards Balanced Alignment: Modal-Enhanced Semantic Modeling for Video Moment Retrieval (AAAI2024)☆29Updated 7 months ago
- [ICCV-2023] The official code of Bridging Vision and Language Encoders: Parameter-Efficient Tuning for Referring Image Segmentation☆96Updated 9 months ago
- [CVPR2024] The code of "UniPT: Universal Parallel Tuning for Transfer Learning with Efficient Parameter and Memory"☆64Updated last month
- SeqTR: A Simple yet Universal Network for Visual Grounding☆131Updated 3 weeks ago
- SmallCap: Lightweight Image Captioning Prompted with Retrieval Augmentation☆95Updated 9 months ago