Data release for the ImageInWords (IIW) paper.
☆226Nov 17, 2024Updated last year
Alternatives and similar repositories for imageinwords
Users that are interested in imageinwords are comparing it to the libraries listed below
Sorting:
- Densely Captioned Images (DCI) dataset repository.☆197Jul 1, 2024Updated last year
- ☆33Nov 4, 2024Updated last year
- DenseFusion-1M: Merging Vision Experts for Comprehensive Multimodal Perception☆159Dec 6, 2024Updated last year
- NeurIPS 2025 Spotlight; ICLR2024 Spotlight; CVPR 2024; EMNLP 2024☆1,815Nov 27, 2025Updated 3 months ago
- [CVPR24 Highlights] Polos: Multimodal Metric Learning from Human Feedback for Image Captioning☆33May 25, 2025Updated 9 months ago
- [ECCV 2024] Official PyTorch implementation of DreamLIP: Language-Image Pre-training with Long Captions☆138May 8, 2025Updated 9 months ago
- Codebase for Aria - an Open Multimodal Native MoE☆1,082Jan 22, 2025Updated last year
- This is the repository for the Photorealistic Unreal Graphics (PUG) datasets for representation learning.☆237Apr 4, 2024Updated last year
- ☆75Mar 7, 2024Updated last year
- [ECCV 2024] official code for "Long-CLIP: Unlocking the Long-Text Capability of CLIP"☆893Aug 13, 2024Updated last year
- Official Code for "Painting with Words: Elevating Detailed Image Captioning with Benchmark and Alignment Learning" (ICLR 2025)☆12Mar 6, 2025Updated 11 months ago
- JAX Scalify: end-to-end scaled arithmetics☆18Oct 30, 2024Updated last year
- LLM2CLIP significantly improves already state-of-the-art CLIP models.☆630Feb 1, 2026Updated last month
- 1.5−3.0× lossless training or pre-training speedup. An off-the-shelf, easy-to-implement algorithm for the efficient training of foundatio…☆226Aug 23, 2024Updated last year
- EdgeSAM model for use with Autodistill.☆30Jun 11, 2024Updated last year
- Grounding DINO 1.5: IDEA Research's Most Capable Open-World Object Detection Model Series☆1,086Jan 21, 2025Updated last year
- ☆156Oct 31, 2024Updated last year
- ☆111Jan 8, 2025Updated last year
- Imagen-mini for girl image generation☆12Nov 19, 2022Updated 3 years ago
- recipe for training fully-featured self supervised image jepa models☆12Jun 4, 2025Updated 8 months ago
- [CVPR 2024 🔥] Grounding Large Multimodal Model (GLaMM), the first-of-its-kind model capable of generating natural language responses tha…☆945Aug 5, 2025Updated 6 months ago
- Official implementation of SEED-LLaMA (ICLR 2024).☆640Sep 21, 2024Updated last year
- Code of paper "A new baseline for edge detection: Make Encoder-Decoder great again"☆40Jun 11, 2025Updated 8 months ago
- ☆15Nov 30, 2023Updated 2 years ago
- Pipeline to scrape prompt + image url pairs from LAION `share-dalle-3` discord channel☆11Oct 10, 2023Updated 2 years ago
- ☆15Dec 7, 2023Updated 2 years ago
- ☆58Apr 24, 2024Updated last year
- Privacy-first ear biometric segmentation - 99%+ accuracy with <2M parameters for edge authentication and GDPR compliance☆31Oct 27, 2025Updated 4 months ago
- Hand and Face Detection for Sign Language☆17Jan 15, 2026Updated last month
- Official Implementation for "MyVLM: Personalizing VLMs for User-Specific Queries" (ECCV 2024)☆186Jul 5, 2024Updated last year
- Grounded Language-Image Pre-training☆2,575Jan 24, 2024Updated 2 years ago
- Cambrian-1 is a family of multimodal LLMs with a vision-centric design.☆1,986Nov 7, 2025Updated 3 months ago
- Official code for CAVIS: Context-Aware Video Instance Segmentation☆97Sep 17, 2025Updated 5 months ago
- ☆15Mar 12, 2024Updated last year
- [CVPR'25 - Rating 555] Official PyTorch implementation of Lumos: Learning Visual Generative Priors without Text☆53Mar 16, 2025Updated 11 months ago
- Official codebase used to develop Vision Transformer, SigLIP, MLP-Mixer, LiT and more.☆3,371May 19, 2025Updated 9 months ago
- Lumina-T2X is a unified framework for Text to Any Modality Generation☆2,252Feb 16, 2025Updated last year
- [ICLR 2025] Diffusion Feedback Helps CLIP See Better☆299Jan 23, 2025Updated last year
- ☆401Dec 12, 2024Updated last year