sithu31296 / image-captioning

Simple and Easy to use Image Captioning Implementation

☆9

Alternatives and similar repositories for image-captioning:

Users that are interested in image-captioning are comparing it to the libraries listed below

aimagelab / camel
CaMEL: Mean Teacher Learning for Image Captioning. ICPR 2022
☆29Updated 2 years ago
sushizixin / CLIP4IDC
CLIP4IDC: CLIP for Image Difference Captioning (AACL 2022)
☆34Updated 2 years ago
yangbang18 / MultiCapCLIP
(ACL'2023) MultiCapCLIP: Auto-Encoding Prompts for Zero-Shot Multilingual Visual Captioning
☆35Updated 7 months ago
YehLi / TDEN
☆9Updated 2 years ago
TXH-mercury / COSA
[ICLR2024] Codes and Models for COSA: Concatenated Sample Pretrained Vision-Language Foundation Model
☆43Updated 3 months ago
arijitray1993 / COLA
COLA: Evaluate how well your vision-language model can Compose Objects Localized with Attributes!
☆24Updated 4 months ago
bearcatt / LaBERT
A length-controllable and non-autoregressive image captioning model.
☆68Updated 3 years ago
YuanEZhou / satic
☆26Updated 3 years ago
nirat1606 / OADis
Official code for "Disentangling Visual Embeddings for Attributes and Objects" Published at CVPR 2022
☆35Updated last year
DavidMChan / caption-by-committee
Using LLMs and pre-trained caption models for super-human performance on image captioning.
☆40Updated last year
aimagelab / pacscore
Positive-Augmented Contrastive Learning for Image and Video Captioning Evaluation. (CVPR 2023)
☆60Updated 3 weeks ago
eric-ai-lab / ComCLIP
Official implementation and dataset for the NAACL 2024 paper "ComCLIP: Training-Free Compositional Image and Text Matching"
☆35Updated 7 months ago
allenai / reclip
☆83Updated 2 years ago
Zasder3 / train-CLIP-FT
☆46Updated 3 years ago
Shahzadnit / EZ-CLIP
☆19Updated last year
kamwoh / sdc
[BMVC 2023 (Oral)] Official pytorch implementation of the paper: "Unsupervised Hashing with Similarity Distribution Calibration"
☆18Updated last year
ShiYaya / emscore
Research code for CVPR 2022 paper: "EMScore: Evaluating Video Captioning via Coarse-Grained and Fine-Grained Embedding Matching"
☆25Updated 2 years ago
GT-RIPL / Xmodal-Ctx
Official PyTorch implementation of our CVPR 2022 paper: Beyond a Pre-Trained Object Detector: Cross-Modal Textual and Visual Context for …
☆61Updated 2 years ago
mzhaoshuai / CenterCLIP
[SIGIR 2022] CenterCLIP: Token Clustering for Efficient Text-Video Retrieval. Also, a text-video retrieval toolbox based on CLIP + fast p…
☆130Updated 2 years ago
Vision-CAIR / RelTransformer
☆29Updated last year
zhjohnchan / SK-VG
[CVPR-2023] The official dataset of Advancing Visual Grounding with Scene Knowledge: Benchmark and Method.
☆30Updated last year
scwangdyd / large_vocabulary_hoi_detection
Code for ICCV2021: Discovering Human Interactions with Large-Vocabulary Objects via Query and Multi-Scale Detection
☆24Updated 3 years ago
EricWWWW / image-caption-metrics
a py3 lib for NLP & image-caption metrics : BLEU METEOR CIDEr ROUGE SPICE WMD
☆14Updated 2 years ago
helblazer811 / RefSAM
Referring Image Segmentation Benchmarking with Segment Anything Model (SAM)
☆38Updated last year
Deferf / CLIP_Video_Representation
Use CLIP to represent video for Retrieval Task
☆69Updated 4 years ago
yonatanbitton / data_efficient_masked_language_modeling_for_vision_and_language
Repository for the paper "Data Efficient Masked Language Modeling for Vision and Language".
☆18Updated 3 years ago
liunian-harold-li / DesCo
☆30Updated last year
baaaad / ECE
[ECCV'22 Poster] Explicit Image Caption Editing
☆21Updated 2 years ago
codezakh / LilT
[ICLR 23] Contrastive Aligned of Vision to Language Through Parameter-Efficient Transfer Learning
☆38Updated last year
FingerRec / OA-Transformer
[CVPR 2022] The code for our paper 《Object-aware Video-language Pre-training for Retrieval》
☆62Updated 2 years ago