Data release for the ImageInWords (IIW) paper.
☆225Nov 17, 2024Updated last year
Alternatives and similar repositories for imageinwords
Users that are interested in imageinwords are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Densely Captioned Images (DCI) dataset repository.☆195Jul 1, 2024Updated last year
- Official Code for "Painting with Words: Elevating Detailed Image Captioning with Benchmark and Alignment Learning" (ICLR 2025)☆14Mar 6, 2025Updated last year
- DenseFusion-1M: Merging Vision Experts for Comprehensive Multimodal Perception☆159Dec 6, 2024Updated last year
- [CVPR24 Highlights] Polos: Multimodal Metric Learning from Human Feedback for Image Captioning☆33May 25, 2025Updated 11 months ago
- NeurIPS 2025 Spotlight; ICLR2024 Spotlight; CVPR 2024; EMNLP 2024☆1,834Nov 27, 2025Updated 5 months ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- [ECCV 2024] Official PyTorch implementation of DreamLIP: Language-Image Pre-training with Long Captions☆138May 8, 2025Updated 11 months ago
- ☆33Nov 4, 2024Updated last year
- LLM2CLIP significantly improves already state-of-the-art CLIP models.☆652Feb 1, 2026Updated 3 months ago
- ☆157Oct 31, 2024Updated last year
- Codebase for Aria - an Open Multimodal Native MoE☆1,087Jan 22, 2025Updated last year
- [ECCV 2024] official code for "Long-CLIP: Unlocking the Long-Text Capability of CLIP"☆897Aug 13, 2024Updated last year
- ☆75Mar 7, 2024Updated 2 years ago
- Grounding DINO 1.5: IDEA Research's Most Capable Open-World Object Detection Model Series☆1,104Jan 21, 2025Updated last year
- This is the repository for the Photorealistic Unreal Graphics (PUG) datasets for representation learning.☆239Apr 4, 2024Updated 2 years ago
- Bare Metal GPUs on DigitalOcean Gradient AI • AdPurpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
- [CVPR 2024 🔥] Grounding Large Multimodal Model (GLaMM), the first-of-its-kind model capable of generating natural language responses tha…☆952Aug 5, 2025Updated 9 months ago
- Imagen-mini for girl image generation☆12Nov 19, 2022Updated 3 years ago
- When do we not need larger vision models?☆418Feb 8, 2025Updated last year
- ☆112Jan 8, 2025Updated last year
- Official implementation of SEED-LLaMA (ICLR 2024).☆641Sep 21, 2024Updated last year
- ☆402Dec 12, 2024Updated last year
- Code of paper "A new baseline for edge detection: Make Encoder-Decoder great again"☆42Apr 11, 2026Updated 3 weeks ago
- Cambrian-1 is a family of multimodal LLMs with a vision-centric design.☆1,996Nov 7, 2025Updated 5 months ago
- Official Implementation for "MyVLM: Personalizing VLMs for User-Specific Queries" (ECCV 2024)☆187Jul 5, 2024Updated last year
- End-to-end encrypted cloud storage - Proton Drive • AdSpecial offer: 40% Off Yearly / 80% Off First Month. Protect your most important files, photos, and documents from prying eyes.
- Lumina-T2X is a unified framework for Text to Any Modality Generation☆2,253Feb 16, 2025Updated last year
- Official implementation of TagAlign☆37Dec 11, 2024Updated last year
- ☆58Apr 24, 2024Updated 2 years ago
- Pipeline to scrape prompt + image url pairs from LAION `share-dalle-3` discord channel☆11Oct 10, 2023Updated 2 years ago
- ☆15May 13, 2024Updated last year
- Code for T-MARS data filtering☆35Aug 23, 2023Updated 2 years ago
- 1.5−3.0× lossless training or pre-training speedup. An off-the-shelf, easy-to-implement algorithm for the efficient training of foundatio…☆227Aug 23, 2024Updated last year
- Grounded Language-Image Pre-training☆2,588Jan 24, 2024Updated 2 years ago
- EdgeSAM model for use with Autodistill.☆30Jun 11, 2024Updated last year
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- RO-ViT CVPR 2023 "Region-Aware Pretraining for Open-Vocabulary Object Detection with Vision Transformers"☆17Aug 24, 2023Updated 2 years ago
- [ICLR 2025 Spotlight] OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text☆419May 5, 2025Updated 11 months ago
- [ICLR 2025] Diffusion Feedback Helps CLIP See Better☆300Jan 23, 2025Updated last year
- ☆4,645Apr 15, 2026Updated 2 weeks ago
- Official codebase used to develop Vision Transformer, SigLIP, MLP-Mixer, LiT and more.☆3,436May 19, 2025Updated 11 months ago
- [CVPR'25 - Rating 555] Official PyTorch implementation of Lumos: Learning Visual Generative Priors without Text☆53Mar 16, 2025Updated last year
- ☆17Oct 30, 2022Updated 3 years ago