feizc / DeeCap
Dynamic Early Exit for Image Captioning
☆16Updated 2 years ago
Related projects ⓘ
Alternatives and complementary repositories for DeeCap
- VQACL: A Novel Visual Question Answering Continual Learning Setting (CVPR'23)☆31Updated 7 months ago
- Official Code for "Knowing what it is: Semantic-enhanced Dual Attention Transformer" (TMM2022)☆20Updated 2 years ago
- [ICCV2023] Official code for "VL-PET: Vision-and-Language Parameter-Efficient Tuning via Granularity Control"☆53Updated last year
- Official PyTorch implementation of the ECCV 2022 paper: Efficient Video Transformers with Spatial-Temporal Token Selection.☆45Updated 2 years ago
- Towards a Unified View on Visual Parameter-Efficient Transfer Learning☆27Updated 2 years ago
- Implementation of PyramidCLIP(NeurIPS2022).☆27Updated 2 years ago
- [ICCV 2021] Official implementation of the paper "TRAR: Routing the Attention Spans in Transformers for Visual Question Answering"☆65Updated 3 years ago
- [ICLR2024] The official implementation of paper "UniAdapter: Unified Parameter-Efficient Transfer Learning for Cross-modal Modeling", by …☆70Updated 9 months ago
- Official code for the paper "Self-Distillation for Few-Shot Image Captioning"☆13Updated 3 years ago
- Vision-Language Pretraining & Efficient Transformer Papers.☆14Updated 2 years ago
- Deep Multimodal Neural Architecture Search☆26Updated 4 years ago
- ☆19Updated last year
- Beyond Masking: Demystifying Token-Based Pre-Training for Vision Transformers☆26Updated 2 years ago
- AFNet(NeurIPS 2022)☆19Updated last year
- [CVPR-22] This is the official implementation of the paper "Adavit: Adaptive vision transformers for efficient image recognition".☆49Updated 2 years ago
- [ECCV'22 Poster] Explicit Image Caption Editing☆21Updated last year
- ☆43Updated 2 years ago
- [CVPR2022 Oral] The official code for "TransRank: Self-supervised Video Representation Learning via Ranking-based Transformation Recognit…☆18Updated 2 years ago
- [CVPR23] A cascaded diffusion captioning model with a novel semantic-conditional diffusion process that upgrades conventional diffusion m…☆56Updated 5 months ago
- DeVLBert: Learning Deconfounded Visio-Linguistic Representations☆27Updated last year
- Wnet: Audio-Guided Video Object Segmentation via Wavelet-Based Cross-Modal Denoising Networks☆21Updated 2 years ago
- Human-like Controllable Image Captioning with Verb-specific Semantic Roles.☆36Updated 2 years ago
- Seeing What You Miss: Vision-Language Pre-training with Semantic Completion Learning☆20Updated 11 months ago
- ☆16Updated 2 years ago
- 📍 Official pytorch implementation of paper "ProtoCLIP: Prototypical Contrastive Language Image Pretraining" (IEEE TNNLS)☆48Updated last year
- This is the official implementation of Elaborative Rehearsal for Zero-shot Action Recognition (ICCV2021)☆36Updated 2 years ago
- [AAAI 2022] This is the official PyTorch implementation of "Less is More: Pay Less Attention in Vision Transformers"☆92Updated 2 years ago
- Winner solution to Generic Event Boundary Captioning task in LOVEU Challenge (CVPR 2023 workshop)☆29Updated 10 months ago
- ROSITA: Enhancing Vision-and-Language Semantic Alignments via Cross- and Intra-modal Knowledge Integration☆56Updated last year