davidnvq / gritLinks

GRIT: Faster and Better Image-captioning Transformer (ECCV 2022)

☆197

Alternatives and similar repositories for grit

Users that are interested in grit are comparing it to the libraries listed below

Sorting:

GT-RIPL / Xmodal-Ctx
Official PyTorch implementation of our CVPR 2022 paper: Beyond a Pre-Trained Object Detector: Cross-Modal Textual and Visual Context for …
☆60Updated 3 years ago
jacobswan1 / ViTCAP
Implementation for CVPR 2022 paper " Injecting Semantic Concepts into End-to-End Image Captionin".
☆43Updated 3 years ago
DavidHuji / CapDec
CapDec: SOTA Zero Shot Image Captioning Using CLIP and GPT2, EMNLP 2022 (findings)
☆201Updated last year
jianjieluo / OpenAI-CLIP-Feature
An easy to use, user-friendly and efficient code for extracting OpenAI CLIP (Global/Grid) features from image and text respectively.
☆134Updated 10 months ago
232525 / PureT
Implementation of 'End-to-End Transformer Based Model for Image Captioning' [AAAI 2022]
☆69Updated last year
zhangxuying1004 / RSTNet
Official Code for 'RSTNet: Captioning with Adaptive Attention on Visual and Non-Visual Words' (CVPR 2021)
☆123Updated 2 years ago
microsoft / SwinBERT
Research code for CVPR 2022 paper "SwinBERT: End-to-End Transformers with Sparse Attention for Video Captioning"
☆245Updated 3 years ago
joeyz0z / ConZIC
Official implementation of "ConZIC: Controllable Zero-shot Image Captioning by Sampling-Based Polishing"
☆74Updated 2 years ago
terry-r123 / Awesome-Captioning
A curated list of Multimodal Captioning related research(including image captioning, video captioning, and text captioning)
☆112Updated 3 years ago
Yushi-Hu / PromptCap
natual language guided image captioning
☆86Updated last year
SjokerLily / awesome-image-captioning
A paper list of image captioning.
☆22Updated 3 years ago
ttengwang / PDVC
End-to-End Dense Video Captioning with Parallel Decoding (ICCV 2021)
☆226Updated last year
RitaRamo / smallcap
SmallCap: Lightweight Image Captioning Prompted with Retrieval Augmentation
☆125Updated last year
xuguohai / X-CLIP
An official implementation for "X-CLIP: End-to-End Multi-grained Contrastive Learning for Video-Text Retrieval"
☆175Updated last year
dhg-wei / DeCap
ICLR 2023 DeCap: Decoding CLIP Latents for Zero-shot Captioning
☆137Updated 2 years ago
BryanPlummer / flickr30k_entities
Flickr30K Entities Dataset
☆181Updated 6 years ago
uta-smile / TCL
code for TCL: Vision-Language Pre-Training with Triple Contrastive Learning, CVPR 2022
☆267Updated last year
jianjieluo / SCD-Net
[CVPR23] A cascaded diffusion captioning model with a novel semantic-conditional diffusion process that upgrades conventional diffusion m…
☆67Updated last year
showlab / all-in-one
[CVPR2023] All in One: Exploring Unified Video-Language Pre-training
☆282Updated 2 years ago
buxiangzhiren / DDCap
☆85Updated 2 years ago
djiajunustc / TransVG
☆193Updated last year
jchenghu / ExpansionNet_v2
Implementation code of the work "Exploiting Multiple Sequence Lengths in Fast End to End Training for Image Captioning"
☆93Updated 10 months ago
UARK-AICV / VLTinT
[AAAI 2023 Oral] VLTinT: Visual-Linguistic Transformer-in-Transformer for Coherent Video Paragraph Captioning
☆68Updated last year
rentainhe / TRAR-VQA
[ICCV 2021] Official implementation of the paper "TRAR: Routing the Attention Spans in Transformers for Visual Question Answering"
☆68Updated 4 years ago
amazon-science / mix-generation
MixGen: A New Multi-Modal Data Augmentation
☆126Updated 2 years ago
nku-shengzheliu / Pytorch-TransVG
An unofficial pytorch implementation of "TransVG: End-to-End Visual Grounding with Transformers".
☆52Updated 4 years ago
ylsung / VL_adapter
PyTorch code for "VL-Adapter: Parameter-Efficient Transfer Learning for Vision-and-Language Tasks" (CVPR2022)
☆207Updated 2 years ago
allenai / reclip
☆88Updated 3 years ago
sail-sg / ptp
[CVPR2023] The code for 《Position-guided Text Prompt for Vision-Language Pre-training》
☆151Updated 2 years ago
mrwu-mac / DIFNet
[CVPR 2022] This repository is for the paper ``DIFNet: Boosting Visual Information Flow for Image Captioning'' .
☆20Updated 2 years ago