Official implementation of SEED-LLaMA (ICLR 2024).
β641Sep 21, 2024Updated last year
Alternatives and similar repositories for SEED
Users that are interested in SEED are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Multimodal Models in Real Worldβ559Feb 24, 2025Updated last year
- π Code and models for the NeurIPS 2023 paper "Generating Images with Multimodal Language Models".β472Jan 19, 2024Updated 2 years ago
- Emu Series: Generative Multimodal Models from BAAIβ1,774Jan 12, 2026Updated 4 months ago
- LaVIT: Empower the Large Language Model to Understand and Generate Visual Contentβ604Oct 6, 2024Updated last year
- (CVPR2024)A benchmark for evaluating Multimodal LLMs using multiple-choice questions.β363Jan 14, 2025Updated last year
- Wordpress hosting with auto-scaling - Free Trial Offer β’ AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- [ICLR 2024 Spotlight] DreamLLM: Synergistic Multimodal Comprehension and Creationβ461Dec 2, 2024Updated last year
- SEED-Voken: A Series of Powerful Visual Tokenizersβ1,008Nov 25, 2025Updated 5 months ago
- Autoregressive Model Beats Diffusion: π¦ Llama for Scalable Image Generationβ1,948Aug 15, 2024Updated last year
- [ICLR & NeurIPS 2025] Repository for Show-o series, One Single Transformer to Unify Multimodal Understanding and Generation.β1,930Jan 8, 2026Updated 4 months ago
- Official code for "What Makes for Good Visual Tokenizers for Large Language Models?".β60Jun 27, 2023Updated 2 years ago
- Next-Token Prediction is All You Needβ2,408Jan 12, 2026Updated 4 months ago
- β648Feb 15, 2024Updated 2 years ago
- Official implementation of paper "MiniGPT-5: Interleaved Vision-and-Language Generation via Generative Vokens"β866May 8, 2025Updated last year
- EVE Series: Encoder-Free Vision-Language Models from BAAIβ369Jul 24, 2025Updated 9 months ago
- Deploy to Railway using AI coding agents - Free Credits Offer β’ AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- MultimodalC4 is a multimodal extension of c4 that interleaves millions of images with text.β953Mar 19, 2025Updated last year
- This repo contains the code for 1D tokenizer and generatorβ1,150Mar 20, 2025Updated last year
- MM-Interleaved: Interleaved Image-Text Generative Modeling via Multi-modal Feature Synchronizerβ253Apr 3, 2024Updated 2 years ago
- EVA Series: Visual Representation Fantasies from BAAIβ2,675Aug 1, 2024Updated last year
- [CVPR 2024] CapsFusion: Rethinking Image-Text Data at Scaleβ214Feb 27, 2024Updated 2 years ago
- (ECCVW 2025)GPT4RoI: Instruction Tuning Large Language Model on Region-of-Interestβ555Jun 3, 2025Updated 11 months ago
- Code and models for the paper "One Transformer Fits All Distributions in Multi-Modal Diffusion"β1,483May 31, 2023Updated 2 years ago
- Repository for Meta Chameleon, a mixed-modal early-fusion foundation model from FAIR.β2,096Jul 29, 2024Updated last year
- LAVIS - A One-stop Library for Language-Vision Intelligenceβ11,221Nov 18, 2024Updated last year
- GPU virtual machines on DigitalOcean Gradient AI β’ AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- Diffusion Powers Video Tokenizer for Comprehension and Generation (CVPR 2025)β87Feb 27, 2025Updated last year
- Official Implementation of "Lumina-mGPT: Illuminate Flexible Photorealistic Text-to-Image Generation with Multimodal Generative Pretrainiβ¦β646Oct 16, 2025Updated 7 months ago
- Official PyTorch implementation of the paper "In-Context Learning Unlocked for Diffusion Models"β414Mar 25, 2024Updated 2 years ago
- Implementation of MagViT2 Tokenizer in Pytorchβ662Jan 12, 2025Updated last year
- [CVPR 2023] Official Implementation of X-Decoder for generalized decoding for pixel, image and languageβ1,344Oct 5, 2023Updated 2 years ago
- Cambrian-1 is a family of multimodal LLMs with a vision-centric design.β1,998Nov 7, 2025Updated 6 months ago
- VisionLLM Seriesβ1,146Feb 27, 2025Updated last year
- DenseFusion-1M: Merging Vision Experts for Comprehensive Multimodal Perceptionβ159Dec 6, 2024Updated last year
- β4,658Apr 15, 2026Updated last month
- Wordpress hosting with auto-scaling - Free Trial Offer β’ AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- An open-source framework for training large multimodal models.β4,099Aug 31, 2024Updated last year
- 𦦠Otter, a multi-modal model based on OpenFlamingo (open-sourced version of DeepMind's Flamingo), trained on MIMIC-IT and showcasing impβ¦β3,379Mar 5, 2024Updated 2 years ago
- [Extended verision ICLR 2025 Blog Track] Anole: An Open, Autoregressive and Native Multimodal Models for Interleaved Image-Text Generatioβ¦β838Jun 16, 2025Updated 11 months ago
- Code for "AnyGPT: Unified Multimodal LLM with Discrete Sequence Modeling"β882Aug 27, 2024Updated last year
- β809Jul 8, 2024Updated last year
- [NeurIPS 2024]OmniTokenizer: one model and one weight for image-video joint tokenization.β323Jul 9, 2024Updated last year
- Turning to Video for Transcript Sortingβ50Aug 27, 2023Updated 2 years ago