arealgoodname / DiffCap
official repository for DiffCap: Exploring Continuous Diffusion on Image Captioning
☆7Updated last year
Related projects ⓘ
Alternatives and complementary repositories for DiffCap
- [CVPR23] A cascaded diffusion captioning model with a novel semantic-conditional diffusion process that upgrades conventional diffusion m…☆56Updated 5 months ago
- (CVPR2024) MeaCap: Memory-Augmented Zero-shot Image Captioning☆37Updated 2 months ago
- ☆85Updated last year
- [ICLR2024] The official implementation of paper "UniAdapter: Unified Parameter-Efficient Transfer Learning for Cross-modal Modeling", by …☆69Updated 9 months ago
- (ACL'2023) MultiCapCLIP: Auto-Encoding Prompts for Zero-Shot Multilingual Visual Captioning☆35Updated 3 months ago
- [CVPR 2024] Retrieval-Augmented Image Captioning with External Visual-Name Memory for Open-World Comprehension☆34Updated 7 months ago
- ☆18Updated last month
- Benchmark data for "Rethinking Benchmarks for Cross-modal Image-text Retrieval" (SIGIR 2023)☆22Updated last year
- ☆21Updated last month
- Official implementation of "ConZIC: Controllable Zero-shot Image Captioning by Sampling-Based Polishing"☆73Updated last year
- Scanning Only Once: An End-to-end Framework for Fast Temporal Grounding in Long Videos☆19Updated 4 months ago
- ☆27Updated last year
- implementation of paper https://arxiv.org/abs/2210.04559☆54Updated 2 years ago
- Controllable mage captioning model with unsupervised modes☆21Updated last year
- ☆24Updated 4 months ago
- (TIP'2023) Concept-Aware Video Captioning: Describing Videos with Effective Prior Information☆22Updated 3 months ago
- ☆60Updated last year
- Official repository for "Boosting Audio Visual Question Answering via Key Semantic-Aware Cues" in ACM MM 2024.☆14Updated 2 weeks ago
- ☆33Updated 2 years ago
- [ICCV 2023] Accurate and Fast Compressed Video Captioning☆34Updated 8 months ago
- ☆13Updated last year
- Some papers about *diverse* image (a few videos) captioning☆25Updated last year
- Official implementation of HawkEye: Training Video-Text LLMs for Grounding Text in Videos☆34Updated 6 months ago
- Official pytorch repository for "Knowing Where to Focus: Event-aware Transformer for Video Grounding" (ICCV 2023)☆48Updated last year
- Dense-Localizing Audio-Visual Events in Untrimmed Videos: A Large-Scale Benchmark and Baseline (CVPR 2023)☆58Updated 9 months ago
- The official implementation of paper "Prototype-based Aleatoric Uncertainty Quantification for Cross-modal Retrieval" accepted by NeurIPS…☆21Updated 5 months ago
- Vision Transformers are Parameter-Efficient Audio-Visual Learners☆85Updated last year
- [ECCV’24] Official Implementation for CAT: Enhancing Multimodal Large Language Model to Answer Questions in Dynamic Audio-Visual Scenario…☆39Updated 2 months ago
- [ICCV2023] Official code for "VL-PET: Vision-and-Language Parameter-Efficient Tuning via Granularity Control"☆52Updated last year
- ☆55Updated last year