feizc / DeeCap
Dynamic Early Exit for Image Captioning
☆17Updated 2 years ago
Alternatives and similar repositories for DeeCap:
Users that are interested in DeeCap are comparing it to the libraries listed below
- 📍 Official pytorch implementation of paper "ProtoCLIP: Prototypical Contrastive Language Image Pretraining" (IEEE TNNLS)☆52Updated last year
- Seeing What You Miss: Vision-Language Pre-training with Semantic Completion Learning☆20Updated last year
- Official Code for "Knowing what it is: Semantic-enhanced Dual Attention Transformer" (TMM2022)☆19Updated 2 years ago
- ☆16Updated 2 years ago
- ☆22Updated last year
- [ICCV2023] Official code for "VL-PET: Vision-and-Language Parameter-Efficient Tuning via Granularity Control"☆53Updated last year
- ☆35Updated 2 years ago
- [CVPR 2022] This repository is for the paper ``DIFNet: Boosting Visual Information Flow for Image Captioning'' .☆20Updated 2 years ago
- [ICCV 2021] Official implementation of the paper "TRAR: Routing the Attention Spans in Transformers for Visual Question Answering"☆66Updated 3 years ago
- Wnet: Audio-Guided Video Object Segmentation via Wavelet-Based Cross-Modal Denoising Networks☆21Updated 2 years ago
- Colorful Prompt Tuning for Pre-trained Vision-Language Models☆48Updated 2 years ago
- VQACL: A Novel Visual Question Answering Continual Learning Setting (CVPR'23)☆33Updated 10 months ago
- Towards a Unified View on Visual Parameter-Efficient Transfer Learning☆26Updated 2 years ago
- ☆35Updated 2 years ago
- [arXiv] Cross-Modal Adapter for Text-Video Retrieval☆55Updated 2 years ago
- ☆34Updated last year
- Source code for EMNLP 2022 paper “PEVL: Position-enhanced Pre-training and Prompt Tuning for Vision-language Models”☆48Updated 2 years ago
- [ECCV'22 Poster] Explicit Image Caption Editing☆21Updated 2 years ago
- Official Codes for Fine-Grained Visual Prompting, NeurIPS 2023☆48Updated last year
- [ICML 2024] Memory-Space Visual Prompting for Efficient Vision-Language Fine-Tuning☆45Updated 9 months ago
- Lightweight Transformer for Multi-modal Tasks☆15Updated 2 years ago
- ☆22Updated 2 years ago
- [CVPR-22] This is the official implementation of the paper "Adavit: Adaptive vision transformers for efficient image recognition".☆51Updated 2 years ago
- [TIP2023] The code of “Plug-and-Play Regulators for Image-Text Matching”☆30Updated 10 months ago
- [CVPR 2022] "The Principle of Diversity: Training Stronger Vision Transformers Calls for Reducing All Levels of Redundancy" by Tianlong C…☆25Updated 2 years ago
- Repository for the paper: Teaching Structured Vision & Language Concepts to Vision & Language Models☆46Updated last year
- DeVLBert: Learning Deconfounded Visio-Linguistic Representations☆27Updated 2 years ago
- Official PyTorch implementation of the ECCV 2022 paper: Efficient Video Transformers with Spatial-Temporal Token Selection.☆46Updated 2 years ago
- Implementation of our IJCAI2022 oral paper, ER-SAN: Enhanced-Adaptive Relation Self-Attention Network for Image Captioning.☆22Updated last year
- [CVPR-2023] The official dataset of Advancing Visual Grounding with Scene Knowledge: Benchmark and Method.☆29Updated last year