Official implementation of SEED-LLaMA (ICLR 2024).
β641Sep 21, 2024Updated last year
Alternatives and similar repositories for SEED
Users that are interested in SEED are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Multimodal Models in Real Worldβ558Feb 24, 2025Updated last year
- π Code and models for the NeurIPS 2023 paper "Generating Images with Multimodal Language Models".β472Jan 19, 2024Updated 2 years ago
- Emu Series: Generative Multimodal Models from BAAIβ1,775Jan 12, 2026Updated 4 months ago
- LaVIT: Empower the Large Language Model to Understand and Generate Visual Contentβ601Oct 6, 2024Updated last year
- (CVPR2024)A benchmark for evaluating Multimodal LLMs using multiple-choice questions.β364Jan 14, 2025Updated last year
- Deploy open-source AI quickly and easily - Special Bonus Offer β’ AdRunpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
- [ICLR 2024 Spotlight] DreamLLM: Synergistic Multimodal Comprehension and Creationβ462Dec 2, 2024Updated last year
- SEED-Voken: A Series of Powerful Visual Tokenizersβ1,011Nov 25, 2025Updated 6 months ago
- Autoregressive Model Beats Diffusion: π¦ Llama for Scalable Image Generationβ1,953Aug 15, 2024Updated last year
- [ICLR & NeurIPS 2025] Repository for Show-o series, One Single Transformer to Unify Multimodal Understanding and Generation.β1,944Jan 8, 2026Updated 5 months ago
- Official code for "What Makes for Good Visual Tokenizers for Large Language Models?".β60Jun 27, 2023Updated 2 years ago
- Next-Token Prediction is All You Needβ2,417Jan 12, 2026Updated 4 months ago
- β648Feb 15, 2024Updated 2 years ago
- Official implementation of paper "MiniGPT-5: Interleaved Vision-and-Language Generation via Generative Vokens"β865May 8, 2025Updated last year
- EVE Series: Encoder-Free Vision-Language Models from BAAIβ369Jul 24, 2025Updated 10 months ago
- GPU virtual machines on DigitalOcean Gradient AI β’ AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- MultimodalC4 is a multimodal extension of c4 that interleaves millions of images with text.β954Mar 19, 2025Updated last year
- This repo contains the code for 1D tokenizer and generatorβ1,155Mar 20, 2025Updated last year
- MM-Interleaved: Interleaved Image-Text Generative Modeling via Multi-modal Feature Synchronizerβ253Apr 3, 2024Updated 2 years ago
- EVA Series: Visual Representation Fantasies from BAAIβ2,683Aug 1, 2024Updated last year
- [CVPR 2024] CapsFusion: Rethinking Image-Text Data at Scaleβ214Feb 27, 2024Updated 2 years ago
- (ECCVW 2025)GPT4RoI: Instruction Tuning Large Language Model on Region-of-Interestβ555Jun 3, 2025Updated last year
- Code and models for the paper "One Transformer Fits All Distributions in Multi-Modal Diffusion"β1,483May 31, 2023Updated 3 years ago
- Repository for Meta Chameleon, a mixed-modal early-fusion foundation model from FAIR.β2,097Jul 29, 2024Updated last year
- LAVIS - A One-stop Library for Language-Vision Intelligenceβ11,236Jun 2, 2026Updated last week
- Virtual machines for every use case on DigitalOcean β’ AdGet dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
- Diffusion Powers Video Tokenizer for Comprehension and Generation (CVPR 2025)β87Feb 27, 2025Updated last year
- Official Implementation of "Lumina-mGPT: Illuminate Flexible Photorealistic Text-to-Image Generation with Multimodal Generative Pretrainiβ¦β647Oct 16, 2025Updated 7 months ago
- Official PyTorch implementation of the paper "In-Context Learning Unlocked for Diffusion Models"β414Mar 25, 2024Updated 2 years ago
- Implementation of MagViT2 Tokenizer in Pytorchβ664Jan 12, 2025Updated last year
- [CVPR 2023] Official Implementation of X-Decoder for generalized decoding for pixel, image and languageβ1,344Oct 5, 2023Updated 2 years ago
- Cambrian-1 is a family of multimodal LLMs with a vision-centric design.β2,003Nov 7, 2025Updated 7 months ago
- VisionLLM Seriesβ1,149Feb 27, 2025Updated last year
- DenseFusion-1M: Merging Vision Experts for Comprehensive Multimodal Perceptionβ159Dec 6, 2024Updated last year
- β4,687Apr 15, 2026Updated last month
- Wordpress hosting with auto-scaling - Free Trial Offer β’ AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- An open-source framework for training large multimodal models.β4,105Aug 31, 2024Updated last year
- 𦦠Otter, a multi-modal model based on OpenFlamingo (open-sourced version of DeepMind's Flamingo), trained on MIMIC-IT and showcasing impβ¦β3,401Mar 5, 2024Updated 2 years ago
- [Extended verision ICLR 2025 Blog Track] Anole: An Open, Autoregressive and Native Multimodal Models for Interleaved Image-Text Generatioβ¦β841Jun 16, 2025Updated 11 months ago
- Code for "AnyGPT: Unified Multimodal LLM with Discrete Sequence Modeling"β882Aug 27, 2024Updated last year
- β812Jul 8, 2024Updated last year
- [NeurIPS 2024]OmniTokenizer: one model and one weight for image-video joint tokenization.β323Jul 9, 2024Updated last year
- Turning to Video for Transcript Sortingβ50Aug 27, 2023Updated 2 years ago