Vision-CAIR/VisualGPT

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/Vision-CAIR/VisualGPT)

Vision-CAIR / VisualGPT

VisualGPT, CVPR 2022 Proceeding, GPT as a decoder for vision-language models

☆342

Alternatives and similar repositories for VisualGPT

Users that are interested in VisualGPT are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

yonatanbitton / data_efficient_masked_language_modeling_for_vision_and_language
View on GitHub
Repository for the paper "Data Efficient Masked Language Modeling for Vision and Language".
☆18Sep 17, 2021Updated 4 years ago
husthuaan / AoANet
View on GitHub
Code for paper "Attention on Attention for Image Captioning". ICCV 2019
☆339May 2, 2021Updated 5 years ago
yahoo / object_relation_transformer
View on GitHub
Implementation of the Object Relation Transformer for Image Captioning
☆180Sep 17, 2024Updated last year
aimagelab / meshed-memory-transformer
View on GitHub
Meshed-Memory Transformer for Image Captioning. CVPR 2020
☆546Dec 21, 2022Updated 3 years ago
Jhhuangkay / DeepOpht-Medical-Report-Generation-for-Retinal-Images-via-Deep-Models-and-Visual-Explanation
View on GitHub
☆46Aug 27, 2025Updated 11 months ago
Virtual machines for every use case on DigitalOcean • Ad
Get dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
zhangxuying1004 / RSTNet
View on GitHub
Official Code for 'RSTNet: Captioning with Adaptive Attention on Visual and Non-Visual Words' (CVPR 2021)
☆123Dec 17, 2022Updated 3 years ago
LuoweiZhou / VLP
View on GitHub
Vision-Language Pre-training for Image Captioning and Question Answering
☆420Jan 18, 2022Updated 4 years ago
luo3300612 / image-captioning-DLCT
View on GitHub
Official pytorch implementation of paper "Dual-Level Collaborative Transformer for Image Captioning" (AAAI 2021).
☆203Jun 8, 2022Updated 4 years ago
forence / Awesome-Visual-Captioning
View on GitHub
This repository focus on Image Captioning & Video Captioning & Seq-to-Seq Learning & NLP
☆410Nov 14, 2022Updated 3 years ago
j-min / VL-T5
View on GitHub
PyTorch code for "Unifying Vision-and-Language Tasks via Text Generation" (ICML 2021)
☆372Jul 29, 2023Updated 3 years ago
researchmm / soho
View on GitHub
[CVPR'21 Oral] Seeing Out of tHe bOx: End-to-End Pre-training for Vision-Language Representation Learning
☆208Sep 30, 2022Updated 3 years ago
facebookresearch / connect-caption-and-trace
View on GitHub
A unified framework to jointly model images, text, and human attention traces.
☆80May 24, 2021Updated 5 years ago
cshizhe / asg2cap
View on GitHub
Code accompanying the paper "Say As You Wish: Fine-grained Control of Image Caption Generation with Abstract Scene Graphs" (Chen et al., …
☆200Dec 1, 2022Updated 3 years ago
pzzhang / VinVL
View on GitHub
project page for VinVL
☆360Jul 26, 2023Updated 3 years ago
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
ylsung / VL_adapter
View on GitHub
PyTorch code for "VL-Adapter: Parameter-Efficient Transfer Learning for Vision-and-Language Tasks" (CVPR2022)
☆212Dec 18, 2022Updated 3 years ago
facebookresearch / grid-feats-vqa
View on GitHub
Grid features pre-training code for visual question answering
☆269Sep 17, 2021Updated 4 years ago
microsoft / Oscar
View on GitHub
Oscar and VinVL
☆1,054Aug 28, 2023Updated 2 years ago
kdexd / virtex
View on GitHub
[CVPR 2021] VirTex: Learning Visual Representations from Textual Annotations
☆561Aug 22, 2025Updated 11 months ago
GT-RIPL / Xmodal-Ctx
View on GitHub
Official PyTorch implementation of our CVPR 2022 paper: Beyond a Pre-Trained Object Detector: Cross-Modal Textual and Visual Context for …
☆61Oct 21, 2022Updated 3 years ago
quangvnai / grit
View on GitHub
GRIT: Faster and Better Image-captioning Transformer (ECCV 2022)
☆199May 9, 2023Updated 3 years ago
OFA-Sys / OFA
View on GitHub
Official repository of OFA (ICML 2022). Paper: OFA: Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence L…
☆2,557Apr 24, 2024Updated 2 years ago
krasserm / fairseq-image-captioning
View on GitHub
Transformer-based image captioning extension for pytorch/fairseq
☆319Dec 18, 2020Updated 5 years ago
JDAI-CV / image-captioning
View on GitHub
Implementation of 'X-Linear Attention Networks for Image Captioning' [CVPR 2020]
☆273Jul 27, 2021Updated 5 years ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
luo3300612 / Semantics-AssistedVideoCaptioning.pytorch
View on GitHub
pytorch implementation of Semantics-AssistedVideoCaptioning
☆11Feb 16, 2023Updated 3 years ago
yangxuntu / SGAE
View on GitHub
☆218Feb 26, 2022Updated 4 years ago
baaaad / ECE
View on GitHub
[ECCV'22 Poster] Explicit Image Caption Editing
☆22Nov 30, 2022Updated 3 years ago
Vision-CAIR / RelTransformer
View on GitHub
☆29Oct 4, 2023Updated 2 years ago
mehdidc / DALLE_clip_score
View on GitHub
Simple script to compute CLIP-based scores given a DALL-e trained model.
☆29Jun 13, 2021Updated 5 years ago
tgisaturday / dalle-lightning
View on GitHub
Refactoring dalle-pytorch and taming-transformers for TPU VM
☆60Aug 30, 2021Updated 4 years ago
MILVLG / bottom-up-attention.pytorch
View on GitHub
A PyTorch reimplementation of bottom-up-attention models
☆302Apr 7, 2022Updated 4 years ago
jacobswan1 / ViTCAP
View on GitHub
Implementation for CVPR 2022 paper " Injecting Semantic Concepts into End-to-End Image Captionin".
☆43May 28, 2022Updated 4 years ago
Vision-CAIR / ChatCaptioner
View on GitHub
Official Repository of ChatCaptioner
☆468Apr 13, 2023Updated 3 years ago
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
yxuansu / MAGIC
View on GitHub
Language Models Can See: Plugging Visual Controls in Text Generation
☆260Jun 1, 2022Updated 4 years ago
LibertFan / ImageCaption
View on GitHub
Bridging by Word: Image-Grounded Vocabulary Construction for Visual Captioning based in ACL2019
☆17Sep 8, 2019Updated 6 years ago
MILVLG / mt-captioning
View on GitHub
A PyTorch implementation of the paper Multimodal Transformer with Multiview Visual Representation for Image Captioning
☆25Sep 4, 2020Updated 5 years ago
jayleicn / recurrent-transformer
View on GitHub
[ACL 2020] PyTorch code for MART: Memory-Augmented Recurrent Transformer for Coherent Video Paragraph Captioning
☆170Dec 4, 2020Updated 5 years ago
intersun / LightningDOT
View on GitHub
source code and pre-trained/fine-tuned checkpoint for NAACL 2021 paper LightningDOT
☆72Nov 14, 2022Updated 3 years ago
clip-vil / CLIP-ViL
View on GitHub
[ICLR 2022] code for "How Much Can CLIP Benefit Vision-and-Language Tasks?" https://arxiv.org/abs/2107.06383
☆419Oct 28, 2022Updated 3 years ago
rmokady / CLIP_prefix_caption
View on GitHub
Simple image captioning model
☆1,421Jun 9, 2024Updated 2 years ago