bladewaltz1/ModeCap

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/bladewaltz1/ModeCap)

bladewaltz1 / ModeCap

Controllable mage captioning model with unsupervised modes

☆21

Alternatives and similar repositories for ModeCap

Users that are interested in ModeCap are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

yytzsy / SMCG
View on GitHub
Code for the paper "Controllable Video Captioning with an Exemplar Sentence"
☆12Apr 14, 2021Updated 5 years ago
bladewaltz1 / PromptSwitch
View on GitHub
☆30Aug 14, 2023Updated 2 years ago
princetonvisualai / SPICE-U
View on GitHub
☆11Sep 7, 2020Updated 5 years ago
aimagelab / PMA-Net
View on GitHub
[ICCV 2023] With a Little Help from your own Past: Prototypical Memory Networks for Image Captioning.
☆19Jun 7, 2024Updated 2 years ago
bearcatt / LaBERT
View on GitHub
A length-controllable and non-autoregressive image captioning model.
☆69Jun 10, 2021Updated 5 years ago
GPUs on demand by Runpod - Special Offer Available • Ad
Run AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
yafuly / SyntacticGen
View on GitHub
☆16Jul 11, 2023Updated 3 years ago
YtongXie / X-RGen
View on GitHub
[ACCV2024 (Oral)] Official pytorch implementation of X-RGen
☆18Jan 20, 2025Updated last year
RyanLiut / awesome-diverse-captioning
View on GitHub
Some papers about *diverse* image (a few videos) captioning
☆25Apr 4, 2023Updated 3 years ago
visinf / cos-cvae
View on GitHub
Diverse Image Captioning with Context-Object Split Latent Spaces (NeurIPS 2020)
☆37May 16, 2022Updated 4 years ago
Yinghao-Li / GuiGen
View on GitHub
☆14Oct 6, 2020Updated 5 years ago
qingzwang / DiversityMetrics
View on GitHub
This is the implementation of self-CIDEr and LSA-based diversity metrics (only for python 2.7).
☆37Feb 26, 2022Updated 4 years ago
WebVLN / WebVLN
View on GitHub
Official implementation of WebVLN: Vision-and-Language Navigation on Websites
☆35Jan 2, 2024Updated 2 years ago
NovaMind-Z / PTSN
View on GitHub
Repository for an end-to-end image captioning method PTSN(ACM MM22).
☆60Dec 11, 2022Updated 3 years ago
chenqi008 / V2C
View on GitHub
Pytorch implementation for “V2C: Visual Voice Cloning”
☆34Jan 28, 2023Updated 3 years ago
AI Agents on DigitalOcean Gradient AI Platform • Ad
Build production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
YuanEZhou / CBTrans
View on GitHub
☆24Apr 4, 2022Updated 4 years ago
yj-yu / lsmdc
View on GitHub
☆33Nov 12, 2018Updated 7 years ago
aimagelab / camel
View on GitHub
CaMEL: Mean Teacher Learning for Image Captioning. ICPR 2022
☆30Dec 1, 2022Updated 3 years ago
jianjieluo / SCD-Net
View on GitHub
[CVPR23] A cascaded diffusion captioning model with a novel semantic-conditional diffusion process that upgrades conventional diffusion m…
☆68Jun 11, 2024Updated 2 years ago
ShiYaya / emscore
View on GitHub
Research code for CVPR 2022 paper: "EMScore: Evaluating Video Captioning via Coarse-Grained and Fine-Grained Embedding Matching"
☆26Oct 20, 2022Updated 3 years ago
zzxslp / XL-VLN
View on GitHub
Dataset for Bilingual VLN
☆11Dec 5, 2020Updated 5 years ago
buxiangzhiren / DDCap
View on GitHub
☆85Dec 4, 2022Updated 3 years ago
e-bug / cross-modal-ablation
View on GitHub
[EMNLP 2021] Code and data for our paper "Vision-and-Language or Vision-for-Language? On Cross-Modal Influence in Multimodal Transformers…
☆20Jan 17, 2022Updated 4 years ago
jinpeng0528 / SEFE
View on GitHub
☆13May 6, 2025Updated last year
AI Agents on DigitalOcean Gradient AI Platform • Ad
Build production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
GT-RIPL / Xmodal-Ctx
View on GitHub
Official PyTorch implementation of our CVPR 2022 paper: Beyond a Pre-Trained Object Detector: Cross-Modal Textual and Visual Context for …
☆61Oct 21, 2022Updated 3 years ago
ajinkya98 / PyTorchCNN
View on GitHub
Implementing CNN in PyTorch with Custom Dataset and Transfer Learning
☆11Aug 24, 2020Updated 5 years ago
ylqi / GL-RG
View on GitHub
The code of IJCAI22 paper "GL-RG: Global-Local Representation Granularity for Video Captioning".
☆18May 10, 2023Updated 3 years ago
YanyuanQiao / HOP-REVERIE-Challenge
View on GitHub
Baseline for REVERIE-Challenge using HOP
☆10Jul 4, 2022Updated 4 years ago
zinengtang / DeCEMBERT
View on GitHub
Pytorch version of DeCEMBERT: Learning from Noisy Instructional Videos via Dense Captions and Entropy Minimization (NAACL 2021)
☆17Jan 12, 2023Updated 3 years ago
hwanheelee1993 / ViLBERTScore
View on GitHub
Code for ViLBERTScore in EMNLP Eval4NLP
☆18Oct 27, 2022Updated 3 years ago
jiahuei / sparse-image-captioning
View on GitHub
Image captioning with weight pruning in PyTorch
☆22Jan 14, 2022Updated 4 years ago
CSC2548 / image_caption_gan
View on GitHub
☆10May 4, 2018Updated 8 years ago
yonatanbitton / data_efficient_masked_language_modeling_for_vision_and_language
View on GitHub
Repository for the paper "Data Efficient Masked Language Modeling for Vision and Language".
☆18Sep 17, 2021Updated 4 years ago
End-to-end encrypted email - Proton Mail • Ad
Special offer: 40% Off Yearly / 80% Off First Month. All Proton services are open source and independently audited for security.
kayburns / women-snowboard
View on GitHub
☆19Nov 22, 2022Updated 3 years ago
baaaad / ECE
View on GitHub
[ECCV'22 Poster] Explicit Image Caption Editing
☆22Nov 30, 2022Updated 3 years ago
rmcong / CoADNet_NeurIPS20
View on GitHub
CoADNet: Collaborative Aggregation-and-Distribution Networks for Co-Salient Object Detection
☆19Jan 8, 2021Updated 5 years ago
YicongHong / Recurrent-VLN-BERT
View on GitHub
Code of the CVPR 2021 Oral paper: A Recurrent Vision-and-Language BERT for Navigation
☆208Aug 13, 2022Updated 3 years ago
TerminologyHub / termhub-in-5-minutes
View on GitHub
Developer project for getting basic API integrations working in under 5 minutes
☆11May 22, 2026Updated 2 months ago
yiyang92 / vae_captioning
View on GitHub
Implementation of Diverse and Accurate Image Description Using a Variational Auto-Encoder with an Additive Gaussian Encoding Space
☆60Apr 5, 2018Updated 8 years ago
zchoi / VCRN
View on GitHub
☆11Jul 11, 2023Updated 3 years ago