jchenghu / ExpansionNet_v2Links

Implementation code of the work "Exploiting Multiple Sequence Lengths in Fast End to End Training for Image Captioning"

☆92

Alternatives and similar repositories for ExpansionNet_v2

Users that are interested in ExpansionNet_v2 are comparing it to the libraries listed below

Sorting:

davidnvq / grit
GRIT: Faster and Better Image-captioning Transformer (ECCV 2022)
☆194Updated 2 years ago
DavidHuji / CapDec
CapDec: SOTA Zero Shot Image Captioning Using CLIP and GPT2, EMNLP 2022 (findings)
☆197Updated last year
jianjieluo / OpenAI-CLIP-Feature
An easy to use, user-friendly and efficient code for extracting OpenAI CLIP (Global/Grid) features from image and text respectively.
☆129Updated 6 months ago
zarzouram / image_captioning_with_transformers
Pytorch implementation of image captioning using transformer-based model.
☆66Updated 2 years ago
dhg-wei / DeCap
ICLR 2023 DeCap: Decoding CLIP Latents for Zero-shot Captioning
☆136Updated 2 years ago
aimagelab / camel
CaMEL: Mean Teacher Learning for Image Captioning. ICPR 2022
☆29Updated 2 years ago
yangbang18 / MultiCapCLIP
(ACL'2023) MultiCapCLIP: Auto-Encoding Prompts for Zero-Shot Multilingual Visual Captioning
☆35Updated 11 months ago
xuguohai / X-CLIP
An official implementation for "X-CLIP: End-to-End Multi-grained Contrastive Learning for Video-Text Retrieval"
☆165Updated last year
boheumd / A2Summ
The official implementation of 'Align and Attend: Multimodal Summarization with Dual Contrastive Losses' (CVPR 2023)
☆76Updated 2 years ago
jianjieluo / SCD-Net
[CVPR23] A cascaded diffusion captioning model with a novel semantic-conditional diffusion process that upgrades conventional diffusion m…
☆64Updated last year
aimagelab / pacscore
[CVPR 2023] Positive-Augmented Contrastive Learning for Image and Video Captioning Evaluation
☆62Updated 4 months ago
YoadTew / zero-shot-video-to-text
☆76Updated 2 years ago
RitaRamo / smallcap
SmallCap: Lightweight Image Captioning Prompted with Retrieval Augmentation
☆114Updated last year
sushizixin / CLIP4IDC
CLIP4IDC: CLIP for Image Difference Captioning (AACL 2022)
☆34Updated 2 years ago
joeyz0z / ConZIC
Official implementation of "ConZIC: Controllable Zero-shot Image Captioning by Sampling-Based Polishing"
☆73Updated last year
YoadTew / zero-shot-image-to-text
Implementation of Zero-Shot Image-to-Text Generation for Visual-Semantic Arithmetic
☆278Updated 2 years ago
sail-sg / ptp
[CVPR2023] The code for 《Position-guided Text Prompt for Vision-Language Pre-training》
☆152Updated 2 years ago
YulongBonjour / SimVLM
SimVLM ---SIMPLE VISUAL LANGUAGE MODEL PRETRAINING WITH WEAK SUPERVISION
☆36Updated 2 years ago
amazon-science / mix-generation
MixGen: A New Multi-Modal Data Augmentation
☆124Updated 2 years ago
Yushi-Hu / PromptCap
natual language guided image captioning
☆84Updated last year
microsoft / SwinBERT
Research code for CVPR 2022 paper "SwinBERT: End-to-End Transformers with Sparse Attention for Video Captioning"
☆240Updated 3 years ago
232525 / PureT
Implementation of 'End-to-End Transformer Based Model for Image Captioning' [AAAI 2022]
☆67Updated last year
TalalWasim / Vita-CLIP
Official repository for "Vita-CLIP: Video and text adaptive CLIP via Multimodal Prompting" [CVPR 2023]
☆120Updated 2 years ago
microsoft / UniCL
[CVPR 2022] Official code for "Unified Contrastive Learning in Image-Text-Label Space"
☆401Updated last year
GT-RIPL / Xmodal-Ctx
Official PyTorch implementation of our CVPR 2022 paper: Beyond a Pre-Trained Object Detector: Cross-Modal Textual and Visual Context for …
☆59Updated 2 years ago
UARK-AICV / VLTinT
[AAAI 2023 Oral] VLTinT: Visual-Linguistic Transformer-in-Transformer for Coherent Video Paragraph Captioning
☆67Updated last year
LightDXY / FT-CLIP
CLIP Itself is a Strong Fine-tuner: Achieving 85.7% and 88.0% Top-1 Accuracy with ViT-B and ViT-L on ImageNet
☆219Updated 2 years ago
SjokerLily / awesome-image-captioning
A paper list of image captioning.
☆22Updated 3 years ago
ilkerkesen / frozen
A PyTorch implementation of Multimodal Few-Shot Learning with Frozen Language Models with OPT.
☆43Updated 2 years ago
zhangxuying1004 / RSTNet
Official Code for 'RSTNet: Captioning with Adaptive Attention on Visual and Non-Visual Words' (CVPR 2021)
☆123Updated 2 years ago