feizc / DeeCapLinks
Dynamic Early Exit for Image Captioning
☆17Updated 2 years ago
Alternatives and similar repositories for DeeCap
Users that are interested in DeeCap are comparing it to the libraries listed below
Sorting:
- ☆16Updated 2 years ago
- Official Code for "Knowing what it is: Semantic-enhanced Dual Attention Transformer" (TMM2022)☆19Updated 2 years ago
- 📍 Official pytorch implementation of paper "ProtoCLIP: Prototypical Contrastive Language Image Pretraining" (IEEE TNNLS)☆52Updated last year
- Wnet: Audio-Guided Video Object Segmentation via Wavelet-Based Cross-Modal Denoising Networks☆22Updated 2 years ago
- Pytorch implementation of paper "Multi-Branch Distance-Sensitive Self-Attention Network for Image Captioning".☆9Updated 2 years ago
- Lightweight Transformer for Multi-modal Tasks☆16Updated 2 years ago
- [ICCV 2021] Official implementation of the paper "TRAR: Routing the Attention Spans in Transformers for Visual Question Answering"☆66Updated 3 years ago
- [CVPR 2022] This repository is for the paper ``DIFNet: Boosting Visual Information Flow for Image Captioning'' .☆20Updated 2 years ago
- [arXiv] Cross-Modal Adapter for Text-Video Retrieval☆55Updated 2 years ago
- Seeing What You Miss: Vision-Language Pre-training with Semantic Completion Learning☆20Updated last year
- Implementation for CVPR 2022 paper " Injecting Semantic Concepts into End-to-End Image Captionin".☆43Updated 3 years ago
- An unofficial pytorch implementation of "TransVG: End-to-End Visual Grounding with Transformers".☆52Updated 4 years ago
- [ICLR2024] Exploring Target Representations for Masked Autoencoders☆56Updated last year
- ☆31Updated 4 years ago
- Offical PyTorch implementation of Clover: Towards A Unified Video-Language Alignment and Fusion Model (CVPR2023)☆40Updated 2 years ago
- Official PyTorch implementation of the ECCV 2022 paper: Efficient Video Transformers with Spatial-Temporal Token Selection.☆47Updated 2 years ago
- CVPR2022 - Language-Bridged Spatial-Temporal Interaction for Referring Video Object Segmentation☆23Updated 2 years ago
- Beyond Masking: Demystifying Token-Based Pre-Training for Vision Transformers☆26Updated 3 years ago
- Towards a Unified View on Visual Parameter-Efficient Transfer Learning☆26Updated 2 years ago
- The Pytorch implementation for "Video-Text Pre-training with Learned Regions"☆42Updated 2 years ago
- Official code for the paper "Self-Distillation for Few-Shot Image Captioning"☆14Updated 4 years ago
- [ICCV 2023 oral] This is the official repository for our paper: ''Sensitivity-Aware Visual Parameter-Efficient Fine-Tuning''.☆71Updated last year
- ☆35Updated last year
- DeVLBert: Learning Deconfounded Visio-Linguistic Representations☆27Updated 2 years ago
- Microsoft COCO Caption Evaluation Tool - Python 3☆33Updated 6 years ago
- [CVPR2022 Oral] The official code for "TransRank: Self-supervised Video Representation Learning via Ranking-based Transformation Recognit…☆18Updated 2 years ago
- ☆39Updated 3 years ago
- ☆9Updated 2 years ago
- Accepted by CVPR 2020.☆27Updated 11 months ago
- Some papers about *diverse* image (a few videos) captioning☆26Updated 2 years ago