researchmm/generate-it

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/researchmm/generate-it)

researchmm / generate-it

A collection of models for image<->text generation in ACM MM 2021.

☆67

Alternatives and similar repositories for generate-it

Users that are interested in generate-it are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

LibertFan / ImageCaption
View on GitHub
Bridging by Word: Image-Grounded Vocabulary Construction for Visual Captioning based in ACL2019
☆17Sep 8, 2019Updated 6 years ago
fenglinliu98 / MIA
View on GitHub
Code for "Aligning Visual Regions and Textual Concepts for Semantic-Grounded Image Representations" （NeurIPS 2019）
☆65Oct 19, 2020Updated 5 years ago
allenai / x-lxmert
View on GitHub
PyTorch code for EMNLP 2020 paper "X-LXMERT: Paint, Caption and Answer Questions with Multi-Modal Transformers"
☆50Aug 27, 2021Updated 4 years ago
princetonvisualai / SPICE-U
View on GitHub
☆11Sep 7, 2020Updated 5 years ago
HYPJUDY / Sparkles
View on GitHub
Sparkles: Unlocking Chats Across Multiple Images for Multimodal Instruction-Following Models
☆45Jun 14, 2024Updated 2 years ago
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
dongdongdong666 / CPGAN
View on GitHub
The method of text-to-image
☆48Dec 19, 2019Updated 6 years ago
njucckevin / KnowCap
View on GitHub
Code for Beyond Generic: Enhancing Image Captioning with Real-World Knowledge using Vision-Language Pre-Training Model
☆13Feb 15, 2024Updated 2 years ago
Oneplus / ELMo
View on GitHub
☆10May 20, 2019Updated 7 years ago
BigRedT / vico
View on GitHub
Multi-sense word embeddings from visual co-occurrences
☆25Sep 5, 2019Updated 6 years ago
bearcatt / LaBERT
View on GitHub
A length-controllable and non-autoregressive image captioning model.
☆69Jun 10, 2021Updated 5 years ago
MinfengZhu / DM-GAN
View on GitHub
☆194Aug 8, 2022Updated 3 years ago
MILVLG / rosita
View on GitHub
ROSITA: Enhancing Vision-and-Language Semantic Alignments via Cross- and Intra-modal Knowledge Integration
☆57Jun 13, 2023Updated 3 years ago
dougsouza / efficient-t2i
View on GitHub
Official implementation of the paper Efficient Neural Architecture for Text-to-Image Synthesis.
☆16Jun 8, 2022Updated 4 years ago
ck0123 / improved-bertscore-for-image-captioning-evaluation
View on GitHub
☆21Jul 25, 2024Updated last year
Serverless GPU API endpoints on Runpod - Get Bonus Credits • Ad
Skip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
Magnety / Multi_modal_Image
View on GitHub
☆12Oct 11, 2024Updated last year
Gitsamshi / WeakVRD-Captioning
View on GitHub
Implementation of paper "Improving Image Captioning with Better Use of Caption"
☆33Sep 15, 2020Updated 5 years ago
j-min / DallEval
View on GitHub
DALL-Eval: Probing the Reasoning Skills and Social Biases of Text-to-Image Generation Models (ICCV 2023)
☆143Jun 10, 2025Updated last year
yangxuntu / SGAE
View on GitHub
☆218Feb 26, 2022Updated 4 years ago
gnobitab / FuseDream
View on GitHub
☆194Dec 7, 2021Updated 4 years ago
yxuansu / MAGIC
View on GitHub
Language Models Can See: Plugging Visual Controls in Text Generation
☆261Jun 1, 2022Updated 4 years ago
nttmdlab-nlp / VisualMRC
View on GitHub
VisualMRC: Machine Reading Comprehension on Document Images (AAAI2021)
☆57Mar 31, 2025Updated last year
huiyegit / T2I_CL
View on GitHub
☆45Dec 26, 2021Updated 4 years ago
xiye17 / EvalQAExpl
View on GitHub
Code for Evaluating Explanations for Reading Comprehension with Realistic Counterfactuals.
☆17Apr 25, 2021Updated 5 years ago
Serverless GPU API endpoints on Runpod - Get Bonus Credits • Ad
Skip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
warmspringwinds / segmentation_in_style
View on GitHub
https://arxiv.org/abs/2107.12518
☆146Aug 15, 2022Updated 3 years ago
YuanEZhou / Grounded-Image-Captioning
View on GitHub
☆64Jan 5, 2022Updated 4 years ago
aimagelab / show-control-and-tell
View on GitHub
Show, Control and Tell: A Framework for Generating Controllable and Grounded Captions. CVPR 2019
☆281Dec 21, 2022Updated 3 years ago
ChenyuGAO-CS / SMA
View on GitHub
The imdb files with SBD-Trans OCR for TextVQA dataset.
☆11Nov 30, 2021Updated 4 years ago
google / localized-narratives
View on GitHub
Localized Narratives
☆86Sep 9, 2021Updated 4 years ago
google-research-datasets / maxm
View on GitHub
MaXM is a suite of test-only benchmarks for multilingual visual question answering in 7 languages: English (en), French (fr), Hindi (hi),…
☆13Jan 16, 2024Updated 2 years ago
linjieli222 / VQA_ReGAT
View on GitHub
Research Code for ICCV 2019 paper "Relation-aware Graph Attention Network for Visual Question Answering"
☆187Apr 15, 2021Updated 5 years ago
princeton-nlp / MADE
View on GitHub
EMNLP 2021: Single-dataset Experts for Multi-dataset Question-Answering
☆68Nov 26, 2021Updated 4 years ago
husthuaan / AoANet
View on GitHub
Code for paper "Attention on Attention for Image Captioning". ICCV 2019
☆339May 2, 2021Updated 5 years ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
Dong-JinKim / DenseRelationalCaptioning
View on GitHub
Code of Dense Relational Captioning
☆69Feb 23, 2023Updated 3 years ago
JIA-Lab-research / SCGAN
View on GitHub
The implementation of 'Image synthesis via semantic composition', ICCV2021.
☆83Mar 3, 2023Updated 3 years ago
zyf12389 / LayoutGAN-Alpha
View on GitHub
Implementation of LayoutGAN https://arxiv.org/abs/1901.06767
☆17May 12, 2019Updated 7 years ago
LuoweiZhou / VLP
View on GitHub
Vision-Language Pre-training for Image Captioning and Question Answering
☆420Jan 18, 2022Updated 4 years ago
MCLAB-OCR / KnowledgeMiningWithSceneText
View on GitHub
☆38Feb 4, 2023Updated 3 years ago
m-bain / CondensedMovies-chall
View on GitHub
Condensed Movies Challenge 2021
☆22Sep 21, 2022Updated 3 years ago
hwanheelee1993 / ViLBERTScore
View on GitHub
Code for ViLBERTScore in EMNLP Eval4NLP
☆18Oct 27, 2022Updated 3 years ago