Official implementation of SEED-LLaMA (ICLR 2024).
β642Sep 21, 2024Updated last year
Alternatives and similar repositories for SEED
Users that are interested in SEED are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Multimodal Models in Real Worldβ557Feb 24, 2025Updated last year
- π Code and models for the NeurIPS 2023 paper "Generating Images with Multimodal Language Models".β472Jan 19, 2024Updated 2 years ago
- Emu Series: Generative Multimodal Models from BAAIβ1,775Jan 12, 2026Updated 5 months ago
- LaVIT: Empower the Large Language Model to Understand and Generate Visual Contentβ603Oct 6, 2024Updated last year
- (CVPR2024)A benchmark for evaluating Multimodal LLMs using multiple-choice questions.β364Jan 14, 2025Updated last year
- Managed Database hosting by DigitalOcean β’ AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- [ICLR 2024 Spotlight] DreamLLM: Synergistic Multimodal Comprehension and Creationβ462Dec 2, 2024Updated last year
- SEED-Voken: A Series of Powerful Visual Tokenizersβ1,012Nov 25, 2025Updated 7 months ago
- Autoregressive Model Beats Diffusion: π¦ Llama for Scalable Image Generationβ1,957Aug 15, 2024Updated last year
- [ICLR & NeurIPS 2025] Repository for Show-o series, One Single Transformer to Unify Multimodal Understanding and Generation.β1,958Jan 8, 2026Updated 5 months ago
- Official code for "What Makes for Good Visual Tokenizers for Large Language Models?".β60Jun 27, 2023Updated 3 years ago
- Next-Token Prediction is All You Needβ2,423Jan 12, 2026Updated 5 months ago
- β649Feb 15, 2024Updated 2 years ago
- Official implementation of paper "MiniGPT-5: Interleaved Vision-and-Language Generation via Generative Vokens"β866May 8, 2025Updated last year
- EVE Series: Encoder-Free Vision-Language Models from BAAIβ372Jul 24, 2025Updated 11 months ago
- Deploy to Railway using AI coding agents - Free Credits Offer β’ AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- MultimodalC4 is a multimodal extension of c4 that interleaves millions of images with text.β954Mar 19, 2025Updated last year
- This repo contains the code for 1D tokenizer and generatorβ1,162Mar 20, 2025Updated last year
- MM-Interleaved: Interleaved Image-Text Generative Modeling via Multi-modal Feature Synchronizerβ253Apr 3, 2024Updated 2 years ago
- EVA Series: Visual Representation Fantasies from BAAIβ2,683Aug 1, 2024Updated last year
- [CVPR 2024] CapsFusion: Rethinking Image-Text Data at Scaleβ214Feb 27, 2024Updated 2 years ago
- (ECCVW 2025)GPT4RoI: Instruction Tuning Large Language Model on Region-of-Interestβ556Jun 3, 2025Updated last year
- Code and models for the paper "One Transformer Fits All Distributions in Multi-Modal Diffusion"β1,484May 31, 2023Updated 3 years ago
- Repository for Meta Chameleon, a mixed-modal early-fusion foundation model from FAIR.β2,102Jul 29, 2024Updated last year
- LAVIS - A One-stop Library for Language-Vision Intelligenceβ11,243Jun 2, 2026Updated 3 weeks ago
- Deploy to Railway using AI coding agents - Free Credits Offer β’ AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- Diffusion Powers Video Tokenizer for Comprehension and Generation (CVPR 2025)β88Feb 27, 2025Updated last year
- Official Implementation of "Lumina-mGPT: Illuminate Flexible Photorealistic Text-to-Image Generation with Multimodal Generative Pretrainiβ¦β646Oct 16, 2025Updated 8 months ago
- Official PyTorch implementation of the paper "In-Context Learning Unlocked for Diffusion Models"β414Mar 25, 2024Updated 2 years ago
- Implementation of MagViT2 Tokenizer in Pytorchβ666Jan 12, 2025Updated last year
- [CVPR 2023] Official Implementation of X-Decoder for generalized decoding for pixel, image and languageβ1,348Oct 5, 2023Updated 2 years ago
- Cambrian-1 is a family of multimodal LLMs with a vision-centric design.β2,005Nov 7, 2025Updated 7 months ago
- VisionLLM Seriesβ1,148Feb 27, 2025Updated last year
- DenseFusion-1M: Merging Vision Experts for Comprehensive Multimodal Perceptionβ159Dec 6, 2024Updated last year
- β4,695Jun 15, 2026Updated 2 weeks ago
- Deploy to Railway using AI coding agents - Free Credits Offer β’ AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- An open-source framework for training large multimodal models.β4,110Aug 31, 2024Updated last year
- 𦦠Otter, a multi-modal model based on OpenFlamingo (open-sourced version of DeepMind's Flamingo), trained on MIMIC-IT and showcasing impβ¦β3,416Mar 5, 2024Updated 2 years ago
- [Extended verision ICLR 2025 Blog Track] Anole: An Open, Autoregressive and Native Multimodal Models for Interleaved Image-Text Generatioβ¦β840Jun 16, 2025Updated last year
- Code for "AnyGPT: Unified Multimodal LLM with Discrete Sequence Modeling"β880Aug 27, 2024Updated last year
- β813Jul 8, 2024Updated last year
- [NeurIPS 2024]OmniTokenizer: one model and one weight for image-video joint tokenization.β324Jul 9, 2024Updated last year
- Densely Captioned Images (DCI) dataset repository.β197Jul 1, 2024Updated 2 years ago