google/imageinwords

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/google/imageinwords)

google / imageinwords

Data release for the ImageInWords (IIW) paper.

☆224

Alternatives and similar repositories for imageinwords

Users that are interested in imageinwords are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

facebookresearch / DCI
View on GitHub
Densely Captioned Images (DCI) dataset repository.
☆197Jul 1, 2024Updated 2 years ago
MAGAer13 / DeCapBench
View on GitHub
Official Code for "Painting with Words: Elevating Detailed Image Captioning with Benchmark and Alignment Learning" (ICLR 2025)
☆14Mar 6, 2025Updated last year
YuigaWada / Polos
View on GitHub
[CVPR24 Highlights] Polos: Multimodal Metric Learning from Human Feedback for Image Captioning
☆33Jun 12, 2026Updated last month
baaivision / DenseFusion
View on GitHub
DenseFusion-1M: Merging Vision Experts for Comprehensive Multimodal Perception
☆159Dec 6, 2024Updated last year
facebookresearch / MetaCLIP
View on GitHub
NeurIPS 2025 Spotlight; ICLR2024 Spotlight; CVPR 2024; EMNLP 2024
☆1,848Nov 27, 2025Updated 7 months ago
Serverless GPU API endpoints on Runpod - Get Bonus Credits • Ad
Skip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
ant-research / DreamLIP
View on GitHub
[ECCV 2024] Official PyTorch implementation of DreamLIP: Language-Image Pre-training with Long Captions
☆138May 8, 2025Updated last year
cloneofsimo / repa-rf
View on GitHub
☆32Nov 4, 2024Updated last year
RifleZhang / LLaVA-Hound-DPO
View on GitHub
☆158Oct 31, 2024Updated last year
rhymes-ai / Aria
View on GitHub
Codebase for Aria - an Open Multimodal Native MoE
☆1,086Jan 22, 2025Updated last year
FudanNLPLAB / MouSi
View on GitHub
☆75Mar 7, 2024Updated 2 years ago
beichenzbc / Long-CLIP
View on GitHub
[ECCV 2024] official code for "Long-CLIP: Unlocking the Long-Text Capability of CLIP"
☆901Aug 13, 2024Updated last year
IDEA-Research / Grounding-DINO-1.5-API
View on GitHub
Grounding DINO 1.5: IDEA Research's Most Capable Open-World Object Detection Model Series
☆1,139Jan 21, 2025Updated last year
microsoft / LLM2CLIP
View on GitHub
LLM2CLIP significantly improves already state-of-the-art CLIP models.
☆679Feb 1, 2026Updated 5 months ago
facebookresearch / PUG
View on GitHub
This is the repository for the Photorealistic Unreal Graphics (PUG) datasets for representation learning.
☆239Apr 4, 2024Updated 2 years ago
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
mbzuai-oryx / groundingLMM
View on GitHub
[CVPR 2024 🔥] Grounding Large Multimodal Model (GLaMM), the first-of-its-kind model capable of generating natural language responses tha…
☆964Aug 5, 2025Updated 11 months ago
bfshi / scaling_on_scales
View on GitHub
When do we not need larger vision models?
☆420Feb 8, 2025Updated last year
alfredplpl / imagen-mini-girl
View on GitHub
Imagen-mini for girl image generation
☆12Nov 19, 2022Updated 3 years ago
snap-research / MyVLM
View on GitHub
Official Implementation for "MyVLM: Personalizing VLMs for User-Specific Queries" (ECCV 2024)
☆188Jul 5, 2024Updated 2 years ago
AILab-CVC / SEED
View on GitHub
Official implementation of SEED-LLaMA (ICLR 2024).
☆642Sep 21, 2024Updated last year
xmoanvaf / llava-phi
View on GitHub
☆401Dec 12, 2024Updated last year
cambrian-mllm / cambrian
View on GitHub
Cambrian-1 is a family of multimodal LLMs with a vision-centric design.
☆2,008Nov 7, 2025Updated 8 months ago
Alpha-VLLM / Lumina-T2X
View on GitHub
Lumina-T2X is a unified framework for Text to Any Modality Generation
☆2,247Feb 16, 2025Updated last year
Li-yachuan / NBED
View on GitHub
Code of paper "A new baseline for edge detection: Make Encoder-Decoder great again"
☆44Apr 11, 2026Updated 3 months ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
Qinying-Liu / TagAlign
View on GitHub
Official implementation of TagAlign
☆37Dec 11, 2024Updated last year
Hritikbansal / videocon
View on GitHub
☆58Apr 24, 2024Updated 2 years ago
ZachNagengast / LAION-Dalle-Scraper
View on GitHub
Pipeline to scrape prompt + image url pairs from LAION `share-dalle-3` discord channel
☆11Oct 10, 2023Updated 2 years ago
locuslab / T-MARS
View on GitHub
Code for T-MARS data filtering
☆35Aug 23, 2023Updated 2 years ago
LeapLabTHU / EfficientTrain
View on GitHub
1.5−3.0× lossless training or pre-training speedup. An off-the-shelf, easy-to-implement algorithm for the efficient training of foundatio…
☆231Aug 23, 2024Updated last year
autodistill / autodistill-grounded-edgesam
View on GitHub
EdgeSAM model for use with Autodistill.
☆30Jun 11, 2024Updated 2 years ago
microsoft / GLIP
View on GitHub
Grounded Language-Image Pre-training
☆2,605Jan 24, 2024Updated 2 years ago
mcahny / rovit
View on GitHub
RO-ViT CVPR 2023 "Region-Aware Pretraining for Open-Vocabulary Object Detection with Vision Transformers"
☆17Aug 24, 2023Updated 2 years ago
LLaVA-VL / LLaVA-NeXT
View on GitHub
☆4,713Jun 15, 2026Updated last month
End-to-end encrypted email - Proton Mail • Ad
Special offer: 40% Off Yearly / 80% Off First Month. All Proton services are open source and independently audited for security.
bino282 / ViNLP
View on GitHub
☆17Oct 30, 2022Updated 3 years ago
j-min / DSG
View on GitHub
Davidsonian Scene Graph (DSG) for Text-to-Image Evaluation (ICLR 2024)
☆109Dec 9, 2024Updated last year
google-research / big_vision
View on GitHub
Official codebase used to develop Vision Transformer, SigLIP, MLP-Mixer, LiT and more.
☆3,500May 19, 2025Updated last year
OpenGVLab / OmniCorpus
View on GitHub
[ICLR 2025 Spotlight] OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text
☆425May 5, 2025Updated last year
baaivision / DIVA
View on GitHub
[ICLR 2025] Diffusion Feedback Helps CLIP See Better
☆301Jan 23, 2025Updated last year
GbotHQ / Blender-3D-document-rendering-pipeline
View on GitHub
Render documents on a virtual paper with folds and other types of damage using blender geometry nodes.
☆27Aug 14, 2023Updated 2 years ago
ant-research / lumos
View on GitHub
[CVPR'25 - Rating 555] Official PyTorch implementation of Lumos: Learning Visual Generative Priors without Text
☆52Mar 16, 2025Updated last year