Official implementation of SEED-LLaMA (ICLR 2024).
β641Sep 21, 2024Updated last year
Alternatives and similar repositories for SEED
Users that are interested in SEED are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Multimodal Models in Real Worldβ557Feb 24, 2025Updated last year
- π Code and models for the NeurIPS 2023 paper "Generating Images with Multimodal Language Models".β471Jan 19, 2024Updated 2 years ago
- Emu Series: Generative Multimodal Models from BAAIβ1,772Jan 12, 2026Updated 3 months ago
- LaVIT: Empower the Large Language Model to Understand and Generate Visual Contentβ605Oct 6, 2024Updated last year
- (CVPR2024)A benchmark for evaluating Multimodal LLMs using multiple-choice questions.β363Jan 14, 2025Updated last year
- Wordpress hosting with auto-scaling on Cloudways β’ AdFully Managed hosting built for WordPress-powered businesses that need reliable, auto-scalable hosting. Cloudways SafeUpdates now available.
- [ICLR 2024 Spotlight] DreamLLM: Synergistic Multimodal Comprehension and Creationβ460Dec 2, 2024Updated last year
- SEED-Voken: A Series of Powerful Visual Tokenizersβ1,002Nov 25, 2025Updated 4 months ago
- [ICLR & NeurIPS 2025] Repository for Show-o series, One Single Transformer to Unify Multimodal Understanding and Generation.β1,909Jan 8, 2026Updated 3 months ago
- Autoregressive Model Beats Diffusion: π¦ Llama for Scalable Image Generationβ1,941Aug 15, 2024Updated last year
- Official code for "What Makes for Good Visual Tokenizers for Large Language Models?".β59Jun 27, 2023Updated 2 years ago
- Next-Token Prediction is All You Needβ2,393Jan 12, 2026Updated 3 months ago
- β644Feb 15, 2024Updated 2 years ago
- Official implementation of paper "MiniGPT-5: Interleaved Vision-and-Language Generation via Generative Vokens"β865May 8, 2025Updated 11 months ago
- EVE Series: Encoder-Free Vision-Language Models from BAAIβ368Jul 24, 2025Updated 8 months ago
- Managed hosting for WordPress and PHP on Cloudways β’ AdManaged hosting with the flexibility to host WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Cloudways by DigitalOcean.
- MultimodalC4 is a multimodal extension of c4 that interleaves millions of images with text.β954Mar 19, 2025Updated last year
- MM-Interleaved: Interleaved Image-Text Generative Modeling via Multi-modal Feature Synchronizerβ252Apr 3, 2024Updated 2 years ago
- This repo contains the code for 1D tokenizer and generatorβ1,140Mar 20, 2025Updated last year
- EVA Series: Visual Representation Fantasies from BAAIβ2,661Aug 1, 2024Updated last year
- [CVPR 2024] CapsFusion: Rethinking Image-Text Data at Scaleβ214Feb 27, 2024Updated 2 years ago
- (ECCVW 2025)GPT4RoI: Instruction Tuning Large Language Model on Region-of-Interestβ554Jun 3, 2025Updated 10 months ago
- Code and models for the paper "One Transformer Fits All Distributions in Multi-Modal Diffusion"β1,480May 31, 2023Updated 2 years ago
- Repository for Meta Chameleon, a mixed-modal early-fusion foundation model from FAIR.β2,094Jul 29, 2024Updated last year
- LAVIS - A One-stop Library for Language-Vision Intelligenceβ11,192Nov 18, 2024Updated last year
- Bare Metal GPUs on DigitalOcean Gradient AI β’ AdPurpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
- Diffusion Powers Video Tokenizer for Comprehension and Generation (CVPR 2025)β86Feb 27, 2025Updated last year
- Official PyTorch implementation of the paper "In-Context Learning Unlocked for Diffusion Models"β414Mar 25, 2024Updated 2 years ago
- Implementation of MagViT2 Tokenizer in Pytorchβ660Jan 12, 2025Updated last year
- [CVPR 2023] Official Implementation of X-Decoder for generalized decoding for pixel, image and languageβ1,343Oct 5, 2023Updated 2 years ago
- Cambrian-1 is a family of multimodal LLMs with a vision-centric design.β1,992Nov 7, 2025Updated 5 months ago
- Official Implementation of "Lumina-mGPT: Illuminate Flexible Photorealistic Text-to-Image Generation with Multimodal Generative Pretrainiβ¦β644Oct 16, 2025Updated 5 months ago
- DenseFusion-1M: Merging Vision Experts for Comprehensive Multimodal Perceptionβ159Dec 6, 2024Updated last year
- VisionLLM Seriesβ1,142Feb 27, 2025Updated last year
- β4,628Sep 14, 2025Updated 6 months ago
- Managed hosting for WordPress and PHP on Cloudways β’ AdManaged hosting with the flexibility to host WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Cloudways by DigitalOcean.
- An open-source framework for training large multimodal models.β4,083Aug 31, 2024Updated last year
- 𦦠Otter, a multi-modal model based on OpenFlamingo (open-sourced version of DeepMind's Flamingo), trained on MIMIC-IT and showcasing impβ¦β3,360Mar 5, 2024Updated 2 years ago
- Anole: An Open, Autoregressive and Native Multimodal Models for Interleaved Image-Text Generationβ833Jun 16, 2025Updated 9 months ago
- Code for "AnyGPT: Unified Multimodal LLM with Discrete Sequence Modeling"β877Aug 27, 2024Updated last year
- β808Jul 8, 2024Updated last year
- [NeurIPS 2024]OmniTokenizer: one model and one weight for image-video joint tokenization.β322Jul 9, 2024Updated last year
- Turning to Video for Transcript Sortingβ49Aug 27, 2023Updated 2 years ago