Official implementation of SEED-LLaMA (ICLR 2024).
β641Sep 21, 2024Updated last year
Alternatives and similar repositories for SEED
Users that are interested in SEED are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Multimodal Models in Real Worldβ559Feb 24, 2025Updated last year
- π Code and models for the NeurIPS 2023 paper "Generating Images with Multimodal Language Models".β471Jan 19, 2024Updated 2 years ago
- Emu Series: Generative Multimodal Models from BAAIβ1,774Jan 12, 2026Updated 3 months ago
- LaVIT: Empower the Large Language Model to Understand and Generate Visual Contentβ604Oct 6, 2024Updated last year
- (CVPR2024)A benchmark for evaluating Multimodal LLMs using multiple-choice questions.β363Jan 14, 2025Updated last year
- Deploy on Railway without the complexity - Free Credits Offer β’ AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- [ICLR 2024 Spotlight] DreamLLM: Synergistic Multimodal Comprehension and Creationβ460Dec 2, 2024Updated last year
- SEED-Voken: A Series of Powerful Visual Tokenizersβ1,003Nov 25, 2025Updated 5 months ago
- Autoregressive Model Beats Diffusion: π¦ Llama for Scalable Image Generationβ1,948Aug 15, 2024Updated last year
- [ICLR & NeurIPS 2025] Repository for Show-o series, One Single Transformer to Unify Multimodal Understanding and Generation.β1,919Jan 8, 2026Updated 3 months ago
- Official code for "What Makes for Good Visual Tokenizers for Large Language Models?".β59Jun 27, 2023Updated 2 years ago
- Next-Token Prediction is All You Needβ2,402Jan 12, 2026Updated 3 months ago
- β647Feb 15, 2024Updated 2 years ago
- Official implementation of paper "MiniGPT-5: Interleaved Vision-and-Language Generation via Generative Vokens"β865May 8, 2025Updated 11 months ago
- EVE Series: Encoder-Free Vision-Language Models from BAAIβ369Jul 24, 2025Updated 9 months ago
- Virtual machines for every use case on DigitalOcean β’ AdGet dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
- MultimodalC4 is a multimodal extension of c4 that interleaves millions of images with text.β954Mar 19, 2025Updated last year
- This repo contains the code for 1D tokenizer and generatorβ1,145Mar 20, 2025Updated last year
- MM-Interleaved: Interleaved Image-Text Generative Modeling via Multi-modal Feature Synchronizerβ253Apr 3, 2024Updated 2 years ago
- EVA Series: Visual Representation Fantasies from BAAIβ2,669Aug 1, 2024Updated last year
- [CVPR 2024] CapsFusion: Rethinking Image-Text Data at Scaleβ214Feb 27, 2024Updated 2 years ago
- (ECCVW 2025)GPT4RoI: Instruction Tuning Large Language Model on Region-of-Interestβ555Jun 3, 2025Updated 10 months ago
- Code and models for the paper "One Transformer Fits All Distributions in Multi-Modal Diffusion"β1,479May 31, 2023Updated 2 years ago
- Repository for Meta Chameleon, a mixed-modal early-fusion foundation model from FAIR.β2,094Jul 29, 2024Updated last year
- LAVIS - A One-stop Library for Language-Vision Intelligenceβ11,212Nov 18, 2024Updated last year
- GPUs on demand by Runpod - Special Offer Available β’ AdRun AI, ML, and HPC workloads on powerful cloud GPUsβwithout limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
- Diffusion Powers Video Tokenizer for Comprehension and Generation (CVPR 2025)β86Feb 27, 2025Updated last year
- Official Implementation of "Lumina-mGPT: Illuminate Flexible Photorealistic Text-to-Image Generation with Multimodal Generative Pretrainiβ¦β645Oct 16, 2025Updated 6 months ago
- Official PyTorch implementation of the paper "In-Context Learning Unlocked for Diffusion Models"β414Mar 25, 2024Updated 2 years ago
- Implementation of MagViT2 Tokenizer in Pytorchβ660Jan 12, 2025Updated last year
- [CVPR 2023] Official Implementation of X-Decoder for generalized decoding for pixel, image and languageβ1,343Oct 5, 2023Updated 2 years ago
- Cambrian-1 is a family of multimodal LLMs with a vision-centric design.β1,996Nov 7, 2025Updated 5 months ago
- VisionLLM Seriesβ1,144Feb 27, 2025Updated last year
- DenseFusion-1M: Merging Vision Experts for Comprehensive Multimodal Perceptionβ159Dec 6, 2024Updated last year
- β4,645Apr 15, 2026Updated 2 weeks ago
- Managed Database hosting by DigitalOcean β’ AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- An open-source framework for training large multimodal models.β4,085Aug 31, 2024Updated last year
- 𦦠Otter, a multi-modal model based on OpenFlamingo (open-sourced version of DeepMind's Flamingo), trained on MIMIC-IT and showcasing impβ¦β3,374Mar 5, 2024Updated 2 years ago
- Anole: An Open, Autoregressive and Native Multimodal Models for Interleaved Image-Text Generationβ837Jun 16, 2025Updated 10 months ago
- Code for "AnyGPT: Unified Multimodal LLM with Discrete Sequence Modeling"β879Aug 27, 2024Updated last year
- β806Jul 8, 2024Updated last year
- [NeurIPS 2024]OmniTokenizer: one model and one weight for image-video joint tokenization.β323Jul 9, 2024Updated last year
- Turning to Video for Transcript Sortingβ49Aug 27, 2023Updated 2 years ago