arealgoodname / DiffCapLinks

official repository for DiffCap: Exploring Continuous Diffusion on Image Captioning

☆8

Alternatives and similar repositories for DiffCap

Users that are interested in DiffCap are comparing it to the libraries listed below

Sorting:

jianjieluo / SCD-Net
[CVPR23] A cascaded diffusion captioning model with a novel semantic-conditional diffusion process that upgrades conventional diffusion m…
☆64Updated last year
RERV / UniAdapter
[ICLR2024] The official implementation of paper "UniAdapter: Unified Parameter-Efficient Transfer Learning for Cross-modal Modeling", by …
☆74Updated last year
boreng0817 / IFCap
[EMNLP 2024] IFCap: Image-like Retrieval and Frequency-based Entity Filtering for Zero-shot Captioning
☆14Updated 2 months ago
RyanLiut / awesome-diverse-captioning
Some papers about *diverse* image (a few videos) captioning
☆26Updated 2 years ago
buxiangzhiren / DDCap
☆84Updated 2 years ago
xu-shitong / diffusion-image-captioning
implementation of paper https://arxiv.org/abs/2210.04559
☆54Updated 2 years ago
alexandrosXe / A-Simple-Baseline-For-Knowledge-Based-VQA
Repo for the EMNLP 2023 paper "A Simple Knowledge-Based Visual Question Answering"
☆22Updated last year
joeyz0z / ConZIC
Official implementation of "ConZIC: Controllable Zero-shot Image Captioning by Sampling-Based Polishing"
☆74Updated last year
joeyz0z / MeaCap
(CVPR2024) MeaCap: Memory-Augmented Zero-shot Image Captioning
☆49Updated 11 months ago
lzp870 / RSFD
☆8Updated 2 years ago
yangbang18 / MultiCapCLIP
(ACL'2023) MultiCapCLIP: Auto-Encoding Prompts for Zero-Shot Multilingual Visual Captioning
☆35Updated last year
ailab-kyunghee / CM2_DVC
[CVPR 2024] Do you remember? Dense Video Captioning with Cross-Modal Memory Retrieval
☆60Updated last year
GeWu-Lab / TSPM
Official repository for "Boosting Audio Visual Question Answering via Key Semantic-Aware Cues" in ACM MM 2024.
☆17Updated 9 months ago
TengdaHan / AutoAD
[CVPR'23 Highlight] AutoAD: Movie Description in Context.
☆100Updated 9 months ago
jpthu17 / DiCoSA
[IJCAI 2023] Text-Video Retrieval with Disentangled Conceptualization and Set-to-Set Alignment
☆52Updated last year
rikeilong / Bay-CAT
[ECCV’24] Official Implementation for CAT: Enhancing Multimodal Large Language Model to Answer Questions in Dynamic Audio-Visual Scenario…
☆54Updated 11 months ago
Lzq5 / Video-Text-Alignment
☆25Updated 3 weeks ago
LeeYN-43 / Clover
Offical PyTorch implementation of Clover: Towards A Unified Video-Language Alignment and Fusion Model (CVPR2023)
☆40Updated 2 years ago
schowdhury671 / meerkat
☆31Updated last month
ytaek-oh / fsc-clip
[EMNLP 2024] Preserving Multi-Modal Capabilities of Pre-trained VLMs for Improving Vision-Linguistic Compositionality
☆17Updated 10 months ago
aimagelab / PMA-Net
With a Little Help from your own Past: Prototypical Memory Networks for Image Captioning. ICCV 2023
☆18Updated last year
lscpku / VITATECS
☆18Updated last year
cwj1412 / MSCOCO-Flikcr30K_FG
Benchmark data for "Rethinking Benchmarks for Cross-modal Image-text Retrieval" (SIGIR 2023)
☆25Updated 2 years ago
fyyCS / LSLD
☆14Updated last year
Jiaxuan-Li / EVCap
[CVPR 2024] Retrieval-Augmented Image Captioning with External Visual-Name Memory for Open-World Comprehension
☆54Updated last year
bighuang624 / VoP
[CVPR 2023] VoP: Text-Video Co-operative Prompt Tuning for Cross-Modal Retrieval
☆38Updated 2 years ago
bladewaltz1 / ModeCap
Controllable mage captioning model with unsupervised modes
☆21Updated 2 years ago
vinid / neg_clip
NegCLIP.
☆34Updated 2 years ago
GenjiB / LAVISH
Vision Transformers are Parameter-Efficient Audio-Visual Learners
☆102Updated 2 years ago
jinhyunj / EaTR
Official pytorch repository for "Knowing Where to Focus: Event-aware Transformer for Video Grounding" (ICCV 2023)
☆50Updated last year