feizc / DeeCap
Dynamic Early Exit for Image Captioning
β17Updated 2 years ago
Alternatives and similar repositories for DeeCap:
Users that are interested in DeeCap are comparing it to the libraries listed below
- Official Code for "Knowing what it is: Semantic-enhanced Dual Attention Transformer" (TMM2022)β19Updated 2 years ago
- π Official pytorch implementation of paper "ProtoCLIP: Prototypical Contrastive Language Image Pretraining" (IEEE TNNLS)β52Updated last year
- [CVPR-22] This is the official implementation of the paper "Adavit: Adaptive vision transformers for efficient image recognition".β51Updated 2 years ago
- DeVLBert: Learning Deconfounded Visio-Linguistic Representationsβ27Updated 2 years ago
- Wnet: Audio-Guided Video Object Segmentation via Wavelet-Based Cross-Modal Denoising Networksβ22Updated 2 years ago
- Seeing What You Miss: Vision-Language Pre-training with Semantic Completion Learningβ20Updated last year
- β17Updated 2 years ago
- This repository contains 2 tools: - A py3 Lib for NLP & image-caption metrics - Code for a two-tailed t-test with paired samples. It wilβ¦β18Updated 4 years ago
- β23Updated 2 years ago
- [ICLR2024] Exploring Target Representations for Masked Autoencodersβ55Updated last year
- β9Updated 2 years ago
- β32Updated 4 years ago
- [arXiv] Cross-Modal Adapter for Text-Video Retrievalβ55Updated 2 years ago
- [ECCV'22 Poster] Explicit Image Caption Editingβ22Updated 2 years ago
- [ICCV 2021] Official implementation of the paper "TRAR: Routing the Attention Spans in Transformers for Visual Question Answering"β66Updated 3 years ago
- Vision-Language Pretraining & Efficient Transformer Papers.β14Updated 3 years ago
- β26Updated 2 years ago
- β19Updated 2 years ago
- Lightweight Transformer for Multi-modal Tasksβ16Updated 2 years ago
- Official implementation of our EMNLP 2022 paper "CPL: Counterfactual Prompt Learning for Vision and Language Models"β33Updated 2 years ago
- Beyond Masking: Demystifying Token-Based Pre-Training for Vision Transformersβ26Updated 3 years ago
- Cross Modal Retrieval with Querybank Normalisationβ55Updated last year
- Implementation of our IJCAI2022 oral paper, ER-SAN: Enhanced-Adaptive Relation Self-Attention Network for Image Captioning.β22Updated last year
- β30Updated last year
- Source code for EMNLP 2022 paper βPEVL: Position-enhanced Pre-training and Prompt Tuning for Vision-language Modelsββ48Updated 2 years ago
- Research code for CVPR 2022 paper: "EMScore: Evaluating Video Captioning via Coarse-Grained and Fine-Grained Embedding Matching"β25Updated 2 years ago
- Adaptive Offline Quintuplet Loss for Image-Text Matching (AOQ)β34Updated 4 years ago
- PyTorch Implementation on Paper [CVPR2021]Distilling Audio-Visual Knowledge by Compositional Contrastive Learningβ87Updated 3 years ago
- β36Updated 2 years ago
- Human-like Controllable Image Captioning with Verb-specific Semantic Roles.β36Updated 3 years ago