RyanLiut/awesome-diverse-captioning

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/RyanLiut/awesome-diverse-captioning)

RyanLiut / awesome-diverse-captioning

Some papers about *diverse* image (a few videos) captioning

☆25

Alternatives and similar repositories for awesome-diverse-captioning

Users that are interested in awesome-diverse-captioning are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

qingzwang / DiversityMetrics
View on GitHub
This is the implementation of self-CIDEr and LSA-based diversity metrics (only for python 2.7).
☆37Feb 26, 2022Updated 4 years ago
yiyang92 / vae_captioning
View on GitHub
Implementation of Diverse and Accurate Image Description Using a Variational Auto-Encoder with an Additive Gaussian Encoding Space
☆60Apr 5, 2018Updated 8 years ago
visinf / cos-cvae
View on GitHub
Diverse Image Captioning with Context-Object Split Latent Spaces (NeurIPS 2020)
☆37May 16, 2022Updated 4 years ago
yytzsy / SMCG
View on GitHub
Code for the paper "Controllable Video Captioning with an Exemplar Sentence"
☆12Apr 14, 2021Updated 5 years ago
StanfordVL / STGraph
View on GitHub
Codebase for CVPR 2020 paper "Spatio-Temporal Graph for Video Captioning with Knowledge Distillation"
☆23Mar 4, 2020Updated 6 years ago
End-to-end encrypted cloud storage - Proton Drive • Ad
Special offer: 40% Off Yearly / 80% Off First Month. Protect your most important files, photos, and documents from prying eyes.
leotam / MIMIC-CXR-annotations
View on GitHub
☆15Aug 4, 2020Updated 5 years ago
bladewaltz1 / ModeCap
View on GitHub
Controllable mage captioning model with unsupervised modes
☆21Apr 14, 2023Updated 3 years ago
jssprz / visual_syntactic_embedding_video_captioning
View on GitHub
Source code of the paper titled *Improving Video Captioning with Temporal Composition of a Visual-Syntactic Embedding*
☆30Apr 16, 2021Updated 5 years ago
bcdnlp / FAITHSCORE
View on GitHub
FaithScore: Fine-grained Evaluations of Hallucinations in Large Vision-Language Models
☆34Nov 27, 2025Updated 7 months ago
MarcusNerva / HMN
View on GitHub
[CVPR2022] Official code for Hierarchical Modular Network for Video Captioning. Our proposed HMN is implemented with PyTorch.
☆50Sep 30, 2022Updated 3 years ago
ubc-tea / Local-Superior-Soups
View on GitHub
☆15Dec 10, 2024Updated last year
xfactlab / I0T
View on GitHub
[ACL Main 2025] I0T: Embedding Standardization Method Towards Zero Modality Gap
☆12Jun 18, 2025Updated last year
ubc-tea / FedTextGrad
View on GitHub
The official implementation of paper "Can Textual Gradient Work in Federated Learning?" accepted at ICLR 2025
☆17Mar 10, 2025Updated last year
zjr2000 / Untrimmed-Video-Feature-Extractor
View on GitHub
A simple and effective feature extractor for untrimmed videos
☆13Sep 1, 2022Updated 3 years ago
AI Agents on DigitalOcean Gradient AI Platform • Ad
Build production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
naver-ai / muco
View on GitHub
Official Pytorch implementation of MuCo: Multi-turn Contrastive Learning for Multimodal Embedding Model (CVPR 2026)
☆15Apr 16, 2026Updated 3 months ago
ubc-tea / FedSoup
View on GitHub
The official Pytorch implementation of paper "FedSoup: Improving Generalization and Personalization in Federated Learning via Selective M…
☆18Apr 14, 2024Updated 2 years ago
LijunRio / A-Self-Guided-Framework
View on GitHub
This repository contains the code accompanying the paper "A Self-Guided Framework for Radiology Report Generation", accepted by MICCAI 20…
☆20Mar 11, 2024Updated 2 years ago
ytaek-oh / retriever
View on GitHub
☆11Sep 15, 2023Updated 2 years ago
yangyan22 / Medical-Report-Generation-TriNet
View on GitHub
Joint Embedding of Deep Visual and Semantic Features for Medical Image Report Generation
☆18Nov 13, 2025Updated 8 months ago
vinid / neg_clip
View on GitHub
NegCLIP.
☆41Feb 6, 2023Updated 3 years ago
microsoft / SwinBERT
View on GitHub
Research code for CVPR 2022 paper "SwinBERT: End-to-End Transformers with Sparse Attention for Video Captioning"
☆251May 26, 2022Updated 4 years ago
NIneeeeeem / LangDC
View on GitHub
[EMNLP 2025 Oral] Official codebase for Seeing More, Saying More: Lightweight Language Experts are Dynamic Video Token Compressors.
☆18Sep 7, 2025Updated 10 months ago
ttengwang / PDVC
View on GitHub
End-to-End Dense Video Captioning with Parallel Decoding (ICCV 2021)
☆230Jan 3, 2024Updated 2 years ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
Georgelingzj / up-to-date-Vision-Language-Models
View on GitHub
Up-to-date Vision Language Models collection. Mainly focus on computer vision
☆20Feb 9, 2023Updated 3 years ago
nasib-ullah / video-captioning-models-in-Pytorch
View on GitHub
A PyTorch implementation of state of the art video captioning models from 2015-2019 on MSVD and MSRVTT datasets.
☆73Jul 30, 2023Updated 2 years ago
MikeWangWZHL / VidIL
View on GitHub
Pytorch code for Language Models with Image Descriptors are Strong Few-Shot Video-Language Learners
☆117Sep 15, 2022Updated 3 years ago
terry-r123 / Awesome-Captioning
View on GitHub
A curated list of Multimodal Captioning related research(including image captioning, video captioning, and text captioning)
☆113Jun 6, 2022Updated 4 years ago
TencentARC / FLM
View on GitHub
Accelerating Vision-Language Pretraining with Free Language Modeling (CVPR 2023)
☆31May 15, 2023Updated 3 years ago
tgc1997 / Awesome-Video-Captioning
View on GitHub
A curated list of research papers in Video Captioning
☆121Jan 5, 2021Updated 5 years ago
paulgavrikov / vlm_shapebias
View on GitHub
Official code for "Can We Talk Models Into Seeing the World Differently?" (ICLR 2025).
☆31Jan 26, 2025Updated last year
ahmetaa / kaldi-jni
View on GitHub
Experiment with JNI access to some Kaldi functions.
☆12Dec 31, 2018Updated 7 years ago
zjr2000 / LLMVA-GEBC
View on GitHub
Winner solution to Generic Event Boundary Captioning task in LOVEU Challenge (CVPR 2023 workshop)
☆29Jan 1, 2024Updated 2 years ago
Bare Metal GPUs on DigitalOcean Gradient AI • Ad
Purpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
zjr2000 / GVL
View on GitHub
Official implementation for paper Learning Grounded Vision-Language Representation for Versatile Understanding in Untrimmed Videos
☆28Dec 8, 2023Updated 2 years ago
kaist-ami / AVHBench
View on GitHub
[ICLR'25] Official repository for "AVHBench: A Cross-Modal Hallucination Evaluation for Audio-Visual Large Language Models"
☆25Mar 8, 2026Updated 4 months ago
bronyayang / HallE_Control
View on GitHub
HallE-Control: Controlling Object Hallucination in LMMs
☆32Apr 10, 2024Updated 2 years ago
kwonjunn01 / Hi-Mapper
View on GitHub
☆19Nov 29, 2024Updated last year
au-revoir / model-editing-ft
View on GitHub
☆13Sep 8, 2024Updated last year
zchoi / VCRN
View on GitHub
☆11Jul 11, 2023Updated 3 years ago
zkysfls / 2024-sbdd-benchmark
View on GitHub
☆13Jan 25, 2026Updated 6 months ago