Data release for the ImageInWords (IIW) paper.
☆225Nov 17, 2024Updated last year
Alternatives and similar repositories for imageinwords
Users that are interested in imageinwords are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Densely Captioned Images (DCI) dataset repository.☆197Jul 1, 2024Updated last year
- Official Code for "Painting with Words: Elevating Detailed Image Captioning with Benchmark and Alignment Learning" (ICLR 2025)☆14Mar 6, 2025Updated last year
- DenseFusion-1M: Merging Vision Experts for Comprehensive Multimodal Perception☆159Dec 6, 2024Updated last year
- [CVPR24 Highlights] Polos: Multimodal Metric Learning from Human Feedback for Image Captioning☆34May 25, 2025Updated last year
- NeurIPS 2025 Spotlight; ICLR2024 Spotlight; CVPR 2024; EMNLP 2024☆1,839Nov 27, 2025Updated 6 months ago
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- [ECCV 2024] Official PyTorch implementation of DreamLIP: Language-Image Pre-training with Long Captions☆139May 8, 2025Updated last year
- ☆33Nov 4, 2024Updated last year
- LLM2CLIP significantly improves already state-of-the-art CLIP models.☆670Feb 1, 2026Updated 4 months ago
- ☆157Oct 31, 2024Updated last year
- Codebase for Aria - an Open Multimodal Native MoE☆1,088Jan 22, 2025Updated last year
- [ECCV 2024] official code for "Long-CLIP: Unlocking the Long-Text Capability of CLIP"☆900Aug 13, 2024Updated last year
- ☆75Mar 7, 2024Updated 2 years ago
- Grounding DINO 1.5: IDEA Research's Most Capable Open-World Object Detection Model Series☆1,122Jan 21, 2025Updated last year
- This is the repository for the Photorealistic Unreal Graphics (PUG) datasets for representation learning.☆239Apr 4, 2024Updated 2 years ago
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- [CVPR 2024 🔥] Grounding Large Multimodal Model (GLaMM), the first-of-its-kind model capable of generating natural language responses tha…☆959Aug 5, 2025Updated 10 months ago
- Imagen-mini for girl image generation☆12Nov 19, 2022Updated 3 years ago
- ☆115Jan 8, 2025Updated last year
- Official implementation of SEED-LLaMA (ICLR 2024).☆641Sep 21, 2024Updated last year
- ☆400Dec 12, 2024Updated last year
- Cambrian-1 is a family of multimodal LLMs with a vision-centric design.☆2,003Nov 7, 2025Updated 7 months ago
- Official Implementation for "MyVLM: Personalizing VLMs for User-Specific Queries" (ECCV 2024)☆188Jul 5, 2024Updated last year
- Code of paper "A new baseline for edge detection: Make Encoder-Decoder great again"☆43Apr 11, 2026Updated 2 months ago
- Lumina-T2X is a unified framework for Text to Any Modality Generation☆2,251Feb 16, 2025Updated last year
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- Official implementation of TagAlign☆37Dec 11, 2024Updated last year
- ☆58Apr 24, 2024Updated 2 years ago
- Pipeline to scrape prompt + image url pairs from LAION `share-dalle-3` discord channel☆11Oct 10, 2023Updated 2 years ago
- ☆15May 13, 2024Updated 2 years ago
- Code for T-MARS data filtering☆35Aug 23, 2023Updated 2 years ago
- 1.5−3.0× lossless training or pre-training speedup. An off-the-shelf, easy-to-implement algorithm for the efficient training of foundatio…☆228Aug 23, 2024Updated last year
- Grounded Language-Image Pre-training☆2,599Jan 24, 2024Updated 2 years ago
- EdgeSAM model for use with Autodistill.☆30Jun 11, 2024Updated 2 years ago
- RO-ViT CVPR 2023 "Region-Aware Pretraining for Open-Vocabulary Object Detection with Vision Transformers"☆17Aug 24, 2023Updated 2 years ago
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- [ICLR 2025 Spotlight] OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text☆424May 5, 2025Updated last year
- [ICLR 2025] Diffusion Feedback Helps CLIP See Better☆301Jan 23, 2025Updated last year
- ☆4,687Apr 15, 2026Updated last month
- Official codebase used to develop Vision Transformer, SigLIP, MLP-Mixer, LiT and more.☆3,464May 19, 2025Updated last year
- [CVPR'25 - Rating 555] Official PyTorch implementation of Lumos: Learning Visual Generative Priors without Text☆53Mar 16, 2025Updated last year
- ☆17Oct 30, 2022Updated 3 years ago
- Render documents on a virtual paper with folds and other types of damage using blender geometry nodes.☆27Aug 14, 2023Updated 2 years ago