DavidMChan/caption-by-committee

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/DavidMChan/caption-by-committee)

DavidMChan / caption-by-committee

Using LLMs and pre-trained caption models for super-human performance on image captioning.

☆42

Alternatives and similar repositories for caption-by-committee

Users that are interested in caption-by-committee are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

DavidMChan / clair
View on GitHub
CLAIR: A (surprisingly) simple semantic text metric with large language models.
☆22Jan 28, 2024Updated 2 years ago
InfiMM / mllm-hd
View on GitHub
Official code for infimm-hd
☆16Sep 4, 2024Updated last year
aimagelab / pacscore
View on GitHub
[CVPR 2023 & IJCV 2025] Positive-Augmented Contrastive Learning for Image and Video Captioning Evaluation
☆66Jul 29, 2025Updated 11 months ago
UCSC-VLAA / CLIPS
View on GitHub
An Enhanced CLIP Framework for Learning with Synthetic Captions
☆40Apr 18, 2025Updated last year
microsoft / multimodal-aligned-recipe-corpus
View on GitHub
☆18Jun 5, 2024Updated 2 years ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
OPTML-Group / VLM-Safety-Unlearn
View on GitHub
[ICLR26] Safety Mirage: How Spurious Correlations Undermine VLM Safety Fine-Tuning and Can Be Mitigated by Machine Unlearning
☆21Apr 16, 2026Updated 3 months ago
quangvnai / grit
View on GitHub
GRIT: Faster and Better Image-captioning Transformer (ECCV 2022)
☆199May 9, 2023Updated 3 years ago
THUSE-Course / course-index
View on GitHub
☆11Mar 3, 2026Updated 4 months ago
fawazsammani / show-edit-tell
View on GitHub
Show, Edit and Tell: A Framework for Editing Image Captions, CVPR 2020
☆82Jul 17, 2020Updated 6 years ago
wenqi-wang20 / jittor-ThisNameIsGeneratedByJittor-Landscape
View on GitHub
第二届计图人工智能挑战赛，基于Jittor的草图风景图像生成大赛
☆10Jan 28, 2023Updated 3 years ago
njucckevin / KnowCap
View on GitHub
Code for Beyond Generic: Enhancing Image Captioning with Real-World Knowledge using Vision-Language Pre-Training Model
☆13Feb 15, 2024Updated 2 years ago
YoadTew / zero-shot-image-to-text
View on GitHub
Implementation of Zero-Shot Image-to-Text Generation for Visual-Semantic Arithmetic
☆279Sep 17, 2022Updated 3 years ago
kayburns / women-snowboard
View on GitHub
☆19Nov 22, 2022Updated 3 years ago
amirbar / StoP
View on GitHub
☆12Jun 26, 2024Updated 2 years ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
megvii-research / protoclip
View on GitHub
📍 Official repository of paper "ProtoCLIP: Prototypical Contrastive Language Image Pretraining" (IEEE TNNLS 2023)
☆56Nov 8, 2023Updated 2 years ago
junyangwang0410 / Knight
View on GitHub
SotA text-only image/video method (IJCAI 2023)
☆14Jan 9, 2024Updated 2 years ago
abhrac / data-free-sbir
View on GitHub
Official implementation of Data-Free Sketch-Based Image Retrieval, CVPR 2023.
☆28Sep 8, 2023Updated 2 years ago
RitaRamo / extra
View on GitHub
Retrieval-augmented Image Captioning
☆13Feb 16, 2023Updated 3 years ago
malmaud / whats_cookin
View on GitHub
Dataset generated by the methods in "What's Cookin'? Interpreting Cooking Videos using Text, Speech and Vision"
☆21May 27, 2015Updated 11 years ago
ZephyrZhuQi / ssbaseline
View on GitHub
Simple is not Easy: A Simple Strong Baseline for TextVQA and TextCaps[AAAI2021]
☆57Apr 5, 2022Updated 4 years ago
jiasenlu / bottom-up-attention
View on GitHub
Bottom-up attention model for image captioning and VQA, based on Faster R-CNN and Visual Genome
☆23Aug 22, 2019Updated 6 years ago
abmfy / wordle
View on GitHub
A Wordle game written in Rust, refined. Play in browser with the power of WebAssembly! Course project of Programming Training, Tsinghua U…
☆16Jul 10, 2024Updated 2 years ago
DavidMChan / grazier
View on GitHub
A tool for calling (and calling out to) large language models.
☆16Aug 13, 2024Updated last year
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
ylecun / yastro
View on GitHub
☆13Jul 5, 2021Updated 5 years ago
Kyunnilee / visual_puzzles
View on GitHub
🧩 Official code repository for “Puzzled by Puzzles: When Vision-Language Models Can’t Take a Hint.”
☆15Sep 22, 2025Updated 10 months ago
guanghuixu / AnchorCaptioner
View on GitHub
☆30May 7, 2021Updated 5 years ago
dhg-wei / DeCap
View on GitHub
ICLR 2023 DeCap: Decoding CLIP Latents for Zero-shot Captioning
☆144Mar 16, 2023Updated 3 years ago
RyanLiut / awesome-diverse-captioning
View on GitHub
Some papers about *diverse* image (a few videos) captioning
☆25Apr 4, 2023Updated 3 years ago
interactive-cookbook / ara
View on GitHub
Corpus and code for Aligned Recipe Actions (ARA) corpus, EMNLP 2021
☆10May 22, 2024Updated 2 years ago
ezjong / lightprobnets
View on GitHub
LightProbNets
☆26Nov 20, 2019Updated 6 years ago
MikeWangWZHL / VidIL
View on GitHub
Pytorch code for Language Models with Image Descriptors are Strong Few-Shot Video-Language Learners
☆117Sep 15, 2022Updated 3 years ago
c7w / cod22-grp64
View on GitHub
>>> 异常中断 + 虚存页表 + 分支预测 + TLB + Cache + Flash + VGA + uCore
☆20Nov 17, 2023Updated 2 years ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
HCPLab-SYSU / HCP-MLR-PL
View on GitHub
Multi-label Image Recognition with Partial Labels (IJCV'24, ESWA'24, AAAI'22)
☆43Jul 15, 2024Updated 2 years ago
boreng0817 / IFCap
View on GitHub
[EMNLP 2024] IFCap: Image-like Retrieval and Frequency-based Entity Filtering for Zero-shot Captioning
☆15May 13, 2025Updated last year
DISL-Lab / BalanceMix
View on GitHub
☆15Dec 12, 2023Updated 2 years ago
joeyz0z / ConZIC
View on GitHub
Official implementation of "ConZIC: Controllable Zero-shot Image Captioning by Sampling-Based Polishing"
☆76Sep 20, 2023Updated 2 years ago
jingyang2017 / AU-Net
View on GitHub
Towards robust facial action units detection
☆23Jan 9, 2024Updated 2 years ago
object-understanding / SLASH
View on GitHub
☆23Aug 26, 2023Updated 2 years ago
bcdnlp / FAITHSCORE
View on GitHub
FaithScore: Fine-grained Evaluations of Hallucinations in Large Vision-Language Models
☆34Nov 27, 2025Updated 8 months ago