kyegomez / CM3Leon
View external linksLinks

An open source implementation of "Scaling Autoregressive Multi-Modal Models: Pretraining and Instruction Tuning", an all-new multi modal AI that uses just a decoder to generate both text and images

☆364

Alternatives and similar repositories for CM3Leon

Users that are interested in CM3Leon are comparing it to the libraries listed below

Sorting:

baaivision / Emu
View on GitHub
Emu Series: Generative Multimodal Models from BAAI
☆1,765Jan 12, 2026Updated last month
AILab-CVC / SEED
View on GitHub
Official implementation of SEED-LLaMA (ICLR 2024).
☆639Sep 21, 2024Updated last year
kohjingyu / gill
View on GitHub
🐟 Code and models for the NeurIPS 2023 paper "Generating Images with Multimodal Language Models".
☆471Jan 19, 2024Updated 2 years ago
allenai / unified-io-2
View on GitHub
☆643Feb 15, 2024Updated last year
huggingface / open-muse
View on GitHub
Open reproduction of MUSE for fast text2image generation.
☆359Jun 1, 2024Updated last year
RunpeiDong / DreamLLM
View on GitHub
[ICLR 2024 Spotlight] DreamLLM: Synergistic Multimodal Comprehension and Creation
☆458Dec 2, 2024Updated last year
jy0205 / LaVIT
View on GitHub
LaVIT: Empower the Large Language Model to Understand and Generate Visual Content
☆602Oct 6, 2024Updated last year
PixArt-alpha / PixArt-alpha
View on GitHub
PixArt-α: Fast Training of Diffusion Transformer for Photorealistic Text-to-Image Synthesis
☆3,279Oct 31, 2024Updated last year
google-research / magvit
View on GitHub
Official JAX implementation of MAGVIT: Masked Generative Video Transformer
☆993Jan 17, 2024Updated 2 years ago
kyegomez / PALI3
View on GitHub
Implementation of PALI3 from the paper PALI-3 VISION LANGUAGE MODELS: SMALLER, FASTER, STRONGER"
☆146Jan 17, 2026Updated 3 weeks ago
mlfoundations / open_flamingo
View on GitHub
An open-source framework for training large multimodal models.
☆4,068Aug 31, 2024Updated last year
Alpha-VLLM / LLaMA2-Accessory
View on GitHub
An Open-source Toolkit for LLM Development
☆2,804Jan 13, 2025Updated last year
tsb0601 / MMVP
View on GitHub
☆360Jan 27, 2024Updated 2 years ago
llava-rlhf / LLaVA-RLHF
View on GitHub
Aligning LMMs with Factually Augmented RLHF
☆392Nov 1, 2023Updated 2 years ago
AILab-CVC / FreeNoise
View on GitHub
[ICLR 2024] Code for FreeNoise based on VideoCrafter
☆426Aug 25, 2025Updated 5 months ago
X-PLUG / mPLUG-Owl
View on GitHub
mPLUG-Owl: The Powerful Multi-modal Large Language Model Family
☆2,537Apr 2, 2025Updated 10 months ago
FoundationVision / LlamaGen
View on GitHub
Autoregressive Model Beats Diffusion: 🦙 Llama for Scalable Image Generation
☆1,932Aug 15, 2024Updated last year
salesforce / LAVIS
View on GitHub
LAVIS - A One-stop Library for Language-Vision Intelligence
☆11,166Nov 18, 2024Updated last year
Zhendong-Wang / Prompt-Diffusion
View on GitHub
Official PyTorch implementation of the paper "In-Context Learning Unlocked for Diffusion Models"
☆413Mar 25, 2024Updated last year
lucidrains / magvit2-pytorch
View on GitHub
Implementation of MagViT2 Tokenizer in Pytorch
☆661Jan 12, 2025Updated last year
openai / consistencydecoder
View on GitHub
Consistency Distilled Diff VAE
☆2,207Nov 7, 2023Updated 2 years ago
thu-ml / unidiffuser
View on GitHub
Code and models for the paper "One Transformer Fits All Distributions in Multi-Modal Diffusion"
☆1,473May 31, 2023Updated 2 years ago
YangLing0818 / RPG-DiffusionMaster
View on GitHub
[ICML 2024] Mastering Text-to-Image Diffusion: Recaptioning, Planning, and Generating with Multimodal LLMs (RPG)
☆1,843Feb 1, 2025Updated last year
facebookresearch / chameleon
View on GitHub
Repository for Meta Chameleon, a mixed-modal early-fusion foundation model from FAIR.
☆2,085Jul 29, 2024Updated last year
google / prompt-to-prompt
View on GitHub
☆3,438May 14, 2024Updated last year
salesforce / ALPRO
View on GitHub
Align and Prompt: Video-and-Language Pre-training with Entity Prompts
☆188May 1, 2025Updated 9 months ago
whlzy / FiT
View on GitHub
[ICML 2024 Spotlight] FiT: Flexible Vision Transformer for Diffusion Model
☆432Nov 10, 2024Updated last year
Alpha-VLLM / Lumina-T2X
View on GitHub
Lumina-T2X is a unified framework for Text to Any Modality Generation
☆2,251Feb 16, 2025Updated 11 months ago
YingqingHe / ScaleCrafter
View on GitHub
[ICLR 2024 Spotlight] Official implementation of ScaleCrafter for higher-resolution visual generation at inference time.
☆510Mar 7, 2024Updated last year
facebookresearch / ImageBind
View on GitHub
ImageBind One Embedding Space to Bind Them All
☆8,971Nov 21, 2025Updated 2 months ago
shikras / shikra
View on GitHub
☆805Jul 8, 2024Updated last year
EvolvingLMMs-Lab / Otter
View on GitHub
🦦 Otter, a multi-modal model based on OpenFlamingo (open-sourced version of DeepMind's Flamingo), trained on MIMIC-IT and showcasing imp…
☆3,292Mar 5, 2024Updated last year
JourneyDB / JourneyDB
View on GitHub
☆180Nov 14, 2025Updated 3 months ago
eric-ai-lab / MiniGPT-5
View on GitHub
Official implementation of paper "MiniGPT-5: Interleaved Vision-and-Language Generation via Generative Vokens"
☆864May 8, 2025Updated 9 months ago
baaivision / CapsFusion
View on GitHub
[CVPR 2024] CapsFusion: Rethinking Image-Text Data at Scale
☆213Feb 27, 2024Updated last year
microsoft / X-Decoder
View on GitHub
[CVPR 2023] Official Implementation of X-Decoder for generalized decoding for pixel, image and language
☆1,342Oct 5, 2023Updated 2 years ago
baaivision / Emu3
View on GitHub
Next-Token Prediction is All You Need
☆2,339Jan 12, 2026Updated last month
TencentQQGYLab / ELLA
View on GitHub
ELLA: Equip Diffusion Models with LLM for Enhanced Semantic Alignment
☆1,276Jul 17, 2024Updated last year
TencentARC / Mix-of-Show
View on GitHub
NeurIPS 2023, Mix-of-Show: Decentralized Low-Rank Adaptation for Multi-Concept Customization of Diffusion Models
☆428May 14, 2024Updated last year

kyegomez / CM3LeonView external linksLinks

Alternatives and similar repositories for CM3Leon

kyegomez / CM3Leon
View external linksLinks