RotsteinNoam / FuseCapLinks

FuseCap: Leveraging Large Language Models for Enriched Fused Image Captions

☆55

Alternatives and similar repositories for FuseCap

Users that are interested in FuseCap are comparing it to the libraries listed below

Sorting:

navervision / CompoDiff
Official Pytorch implementation of "CompoDiff: Versatile Composed Image Retrieval With Latent Diffusion" (TMLR 2024)
☆87Updated 9 months ago
facebookresearch / diht
Filtering, Distillation, and Hard Negatives for Vision-Language Pre-Training
☆138Updated 2 years ago
facebookresearch / genecis
Code and Models for "GeneCIS A Benchmark for General Conditional Image Similarity"
☆61Updated 2 years ago
opendatalab / CLIP-Parrot-Bias
ECCV2024_Parrot Captions Teach CLIP to Spot Text
☆65Updated last year
j-min / DSG
Davidsonian Scene Graph (DSG) for Text-to-Image Evaluation (ICLR 2024)
☆100Updated 11 months ago
NVlabs / PALAVRA
☆53Updated 3 years ago
j-min / VPGen
Visual Programming for Text-to-Image Generation and Evaluation (NeurIPS 2023)
☆56Updated 2 years ago
hammoudhasan / SynthCLIP
Code base of SynthCLIP: CLIP training with purely synthetic text-image pairs from LLMs and TTIs.
☆101Updated 7 months ago
eslambakr / HRS_benchmark
☆61Updated 2 years ago
linzhiqiu / CLIP-FlanT5
Training code for CLIP-FlanT5
☆30Updated last year
Hritikbansal / videocon
☆58Updated last year
codezakh / LilT
[ICLR 23] Contrastive Aligned of Vision to Language Through Parameter-Efficient Transfer Learning
☆40Updated 2 years ago
TencentARC / GVT
Official code for "What Makes for Good Visual Tokenizers for Large Language Models?".
☆58Updated 2 years ago
microsoft / LAVENDER
A Unified Framework for Video-Language Understanding
☆60Updated 2 years ago
yeezhu / UNIT
PyTorch implementation of "UNIT: Unifying Image and Text Recognition in One Vision Encoder", NeurlPS 2024.
☆31Updated last year
zhjohnchan / SK-VG
[CVPR-2023] The official dataset of Advancing Visual Grounding with Scene Knowledge: Benchmark and Method.
☆32Updated 2 years ago
BrandonHanx / FAME-ViL
[CVPR 2023 (Highlight)] FAME-ViL: Multi-Tasking V+L Model for Heterogeneous Fashion Tasks
☆55Updated 2 years ago
YujieLu10 / LLMScore
LLMScore: Unveiling the Power of Large Language Models in Text-to-Image Synthesis Evaluation
☆133Updated 2 years ago
hananshafi / llmblueprint
[ICLR 2024] Official code for the paper "LLM Blueprint: Enabling Text-to-Image Generation with Complex and Detailed Prompts"
☆82Updated last year
X2FD / LVIS-INSTRUCT4V
☆133Updated last year
DavidMChan / caption-by-committee
Using LLMs and pre-trained caption models for super-human performance on image captioning.
☆42Updated 2 years ago
aszala / VPEval
VPEval Codebase from Visual Programming for Text-to-Image Generation and Evaluation (NeurIPS 2023)
☆44Updated last year
eric-ai-lab / Discffusion
Official repo for the TMLR paper "Discffusion: Discriminative Diffusion Models as Few-shot Vision and Language Learners"
☆30Updated last year
showlab / cosmo
☆73Updated last year
facebookresearch / CiT
Code for the paper titled "CiT Curation in Training for Effective Vision-Language Data".
☆78Updated 2 years ago
baaivision / CapsFusion
[CVPR 2024] CapsFusion: Rethinking Image-Text Data at Scale
☆212Updated last year
icoz69 / StableLLAVA
Official repo for StableLLAVA
☆94Updated last year
facebookresearch / DCI
Densely Captioned Images (DCI) dataset repository.
☆191Updated last year
salesforce / MUST
PyTorch code for MUST
☆107Updated 6 months ago
ZhangYuanhan-AI / visual_prompt_retrieval
[NeurIPS2023] Official implementation and model release of the paper "What Makes Good Examples for Visual In-Context Learning?"
☆179Updated last year