Data release for the ImageInWords (IIW) paper.
☆225Nov 17, 2024Updated last year
Alternatives and similar repositories for imageinwords
Users that are interested in imageinwords are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Densely Captioned Images (DCI) dataset repository.☆196Jul 1, 2024Updated last year
- Official Code for "Painting with Words: Elevating Detailed Image Captioning with Benchmark and Alignment Learning" (ICLR 2025)☆13Mar 6, 2025Updated last year
- DenseFusion-1M: Merging Vision Experts for Comprehensive Multimodal Perception☆158Dec 6, 2024Updated last year
- [CVPR24 Highlights] Polos: Multimodal Metric Learning from Human Feedback for Image Captioning☆33May 25, 2025Updated 10 months ago
- NeurIPS 2025 Spotlight; ICLR2024 Spotlight; CVPR 2024; EMNLP 2024☆1,832Nov 27, 2025Updated 4 months ago
- Serverless GPU API endpoints on Runpod - Bonus Credits • AdSkip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
- [ECCV 2024] Official PyTorch implementation of DreamLIP: Language-Image Pre-training with Long Captions☆138May 8, 2025Updated 11 months ago
- ☆33Nov 4, 2024Updated last year
- LLM2CLIP significantly improves already state-of-the-art CLIP models.☆645Feb 1, 2026Updated 2 months ago
- ☆157Oct 31, 2024Updated last year
- Codebase for Aria - an Open Multimodal Native MoE☆1,084Jan 22, 2025Updated last year
- [ECCV 2024] official code for "Long-CLIP: Unlocking the Long-Text Capability of CLIP"☆895Aug 13, 2024Updated last year
- ☆75Mar 7, 2024Updated 2 years ago
- Grounding DINO 1.5: IDEA Research's Most Capable Open-World Object Detection Model Series☆1,095Jan 21, 2025Updated last year
- This is the repository for the Photorealistic Unreal Graphics (PUG) datasets for representation learning.☆239Apr 4, 2024Updated 2 years ago
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- [CVPR 2024 🔥] Grounding Large Multimodal Model (GLaMM), the first-of-its-kind model capable of generating natural language responses tha…☆951Aug 5, 2025Updated 8 months ago
- Imagen-mini for girl image generation☆12Nov 19, 2022Updated 3 years ago
- When do we not need larger vision models?☆418Feb 8, 2025Updated last year
- ☆402Dec 12, 2024Updated last year
- ☆112Jan 8, 2025Updated last year
- Official implementation of SEED-LLaMA (ICLR 2024).☆641Sep 21, 2024Updated last year
- Code of paper "A new baseline for edge detection: Make Encoder-Decoder great again"☆41Updated this week
- Cambrian-1 is a family of multimodal LLMs with a vision-centric design.☆1,992Nov 7, 2025Updated 5 months ago
- Official Implementation for "MyVLM: Personalizing VLMs for User-Specific Queries" (ECCV 2024)☆186Jul 5, 2024Updated last year
- Deploy open-source AI quickly and easily - Bonus Offer • AdRunpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
- Lumina-T2X is a unified framework for Text to Any Modality Generation☆2,253Feb 16, 2025Updated last year
- JAX Scalify: end-to-end scaled arithmetics☆18Oct 30, 2024Updated last year
- Official implementation of TagAlign☆37Dec 11, 2024Updated last year
- ☆58Apr 24, 2024Updated last year
- Pipeline to scrape prompt + image url pairs from LAION `share-dalle-3` discord channel☆11Oct 10, 2023Updated 2 years ago
- ☆15May 13, 2024Updated last year
- Code for T-MARS data filtering☆35Aug 23, 2023Updated 2 years ago
- 1.5−3.0× lossless training or pre-training speedup. An off-the-shelf, easy-to-implement algorithm for the efficient training of foundatio…☆226Aug 23, 2024Updated last year
- Grounded Language-Image Pre-training☆2,584Jan 24, 2024Updated 2 years ago
- Virtual machines for every use case on DigitalOcean • AdGet dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
- EdgeSAM model for use with Autodistill.☆30Jun 11, 2024Updated last year
- RO-ViT CVPR 2023 "Region-Aware Pretraining for Open-Vocabulary Object Detection with Vision Transformers"☆17Aug 24, 2023Updated 2 years ago
- [ICLR 2025 Spotlight] OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text☆415May 5, 2025Updated 11 months ago
- [ICLR 2025] Diffusion Feedback Helps CLIP See Better☆301Jan 23, 2025Updated last year
- ☆4,628Sep 14, 2025Updated 6 months ago
- Official codebase used to develop Vision Transformer, SigLIP, MLP-Mixer, LiT and more.☆3,419May 19, 2025Updated 10 months ago
- ☆17Oct 30, 2022Updated 3 years ago