milkymap / transformer-image-captioningLinks

Implementation of the paper CPTR : FULL TRANSFORMER NETWORK FOR IMAGE CAPTIONING

☆30

Alternatives and similar repositories for transformer-image-captioning

Users that are interested in transformer-image-captioning are comparing it to the libraries listed below

Sorting:

davidnvq / grit
GRIT: Faster and Better Image-captioning Transformer (ECCV 2022)
☆194Updated 2 years ago
zarzouram / image_captioning_with_transformers
Pytorch implementation of image captioning using transformer-based model.
☆66Updated 2 years ago
DavidHuji / CapDec
CapDec: SOTA Zero Shot Image Captioning Using CLIP and GPT2, EMNLP 2022 (findings)
☆197Updated last year
232525 / PureT
Implementation of 'End-to-End Transformer Based Model for Image Captioning' [AAAI 2022]
☆67Updated last year
jianjieluo / SCD-Net
[CVPR23] A cascaded diffusion captioning model with a novel semantic-conditional diffusion process that upgrades conventional diffusion m…
☆64Updated last year
RitaRamo / smallcap
SmallCap: Lightweight Image Captioning Prompted with Retrieval Augmentation
☆114Updated last year
SjokerLily / awesome-image-captioning
A paper list of image captioning.
☆22Updated 3 years ago
jacobswan1 / ViTCAP
Implementation for CVPR 2022 paper " Injecting Semantic Concepts into End-to-End Image Captionin".
☆43Updated 3 years ago
joeyz0z / ConZIC
Official implementation of "ConZIC: Controllable Zero-shot Image Captioning by Sampling-Based Polishing"
☆73Updated last year
GT-RIPL / Xmodal-Ctx
Official PyTorch implementation of our CVPR 2022 paper: Beyond a Pre-Trained Object Detector: Cross-Modal Textual and Visual Context for …
☆59Updated 2 years ago
ttengwang / PDVC
End-to-End Dense Video Captioning with Parallel Decoding (ICCV 2021)
☆219Updated last year
xuguohai / X-CLIP
An official implementation for "X-CLIP: End-to-End Multi-grained Contrastive Learning for Video-Text Retrieval"
☆165Updated last year
jianjieluo / OpenAI-CLIP-Feature
An easy to use, user-friendly and efficient code for extracting OpenAI CLIP (Global/Grid) features from image and text respectively.
☆129Updated 6 months ago
microsoft / SwinBERT
Research code for CVPR 2022 paper "SwinBERT: End-to-End Transformers with Sparse Attention for Video Captioning"
☆240Updated 3 years ago
terry-r123 / Awesome-Captioning
A curated list of Multimodal Captioning related research(including image captioning, video captioning, and text captioning)
☆109Updated 3 years ago
Yushi-Hu / PromptCap
natual language guided image captioning
☆84Updated last year
RoyalSkye / Image-Caption
Using LSTM or Transformer to solve Image Captioning in Pytorch
☆78Updated 3 years ago
jchenghu / ExpansionNet_v2
Implementation code of the work "Exploiting Multiple Sequence Lengths in Fast End to End Training for Image Captioning"
☆92Updated 6 months ago
jssprz / video_captioning_datasets
Summary about Video-to-Text datasets. This repository is part of the review paper *Bridging Vision and Language from the Video-to-Text Pe…
☆126Updated last year
saahiluppal / catr
Image Captioning Using Transformer
☆268Updated 3 years ago
yaolinli / IDC
☆28Updated 2 years ago
zhangxuying1004 / RSTNet
Official Code for 'RSTNet: Captioning with Adaptive Attention on Visual and Non-Visual Words' (CVPR 2021)
☆123Updated 2 years ago
dhg-wei / DeCap
ICLR 2023 DeCap: Decoding CLIP Latents for Zero-shot Captioning
☆136Updated 2 years ago
junyangwang0410 / Knight
SotA text-only image/video method (IJCAI 2023)
☆16Updated last year
phellonchen / awesome-Vision-and-Language-Pre-training
Recent Advances in Vision and Language Pre-training (VLP)
☆292Updated 2 years ago
uta-smile / TCL
code for TCL: Vision-Language Pre-Training with Triple Contrastive Learning, CVPR 2022
☆264Updated 9 months ago
aimagelab / meshed-memory-transformer
Meshed-Memory Transformer for Image Captioning. CVPR 2020
☆540Updated 2 years ago
salaniz / pycocoevalcap
Python 3 support for the MS COCO caption evaluation tools
☆321Updated 11 months ago
BryanPlummer / flickr30k_entities
Flickr30K Entities Dataset
☆177Updated 6 years ago
Aldenhovel / bleu-rouge-meteor-cider-spice-eval4imagecaption
Evaluation tools for image captioning. Including BLEU, ROUGE-L, CIDEr, METEOR, SPICE scores.
☆30Updated 2 years ago