kyegomez / CM3LeonLinks
An open source implementation of "Scaling Autoregressive Multi-Modal Models: Pretraining and Instruction Tuning", an all-new multi modal AI that uses just a decoder to generate both text and images
β361Updated last year
Alternatives and similar repositories for CM3Leon
Users that are interested in CM3Leon are comparing it to the libraries listed below
Sorting:
- Official implementation of SEED-LLaMA (ICLR 2024).β613Updated 9 months ago
- π Code and models for the NeurIPS 2023 paper "Generating Images with Multimodal Language Models".β456Updated last year
- Open reproduction of MUSE for fast text2image generation.β351Updated last year
- LLaVA-Plus: Large Language and Vision Assistants that Plug and Learn to Use Skillsβ744Updated last year
- LLM-grounded Diffusion: Enhancing Prompt Understanding of Text-to-Image Diffusion Models with Large Language Models (LLM-grounded Diffusiβ¦β474Updated 9 months ago
- Code/Data for the paper: "LLaVAR: Enhanced Visual Instruction Tuning for Text-Rich Image Understanding"β267Updated last year
- LLaVA-Interactive-Demoβ374Updated 10 months ago
- PyTorch implementation of InstructDiffusion, a unifying and generic framework for aligning computer vision tasks with human instructions.β428Updated last year
- Official implementation of paper "MiniGPT-5: Interleaved Vision-and-Language Generation via Generative Vokens"β860Updated last month
- LaVIT: Empower the Large Language Model to Understand and Generate Visual Contentβ583Updated 8 months ago
- DataComp: In search of the next generation of multimodal datasetsβ717Updated last month
- VisionLLaMA: A Unified LLaMA Backbone for Vision Tasksβ385Updated 11 months ago
- Official Repository of ChatCaptionerβ464Updated 2 years ago
- Multimodal Models in Real Worldβ511Updated 3 months ago
- [NeurIPS 2023] Official implementations of "Cheap and Quick: Efficient Vision-Language Instruction Tuning for Large Language Models"β520Updated last year
- [CVPR 2024] VCoder: Versatile Vision Encoders for Multimodal Large Language Modelsβ278Updated last year
- Official PyTorch implementation of the paper "In-Context Learning Unlocked for Diffusion Models"β406Updated last year
- β613Updated last year
- [NeurIPS 2023] This repository includes the official implementation of our paper "An Inverse Scaling Law for CLIP Training"β314Updated last year
- [ICLR 2024 Spotlight] DreamLLM: Synergistic Multimodal Comprehension and Creationβ448Updated 6 months ago
- Training-Free Structured Diffusion Guidance for Compositional Text-to-Image Synthesisβ318Updated last year
- Implementation of PALI3 from the paper PALI-3 VISION LANGUAGE MODELS: SMALLER, FASTER, STRONGER"β145Updated 2 months ago
- Official Implementation of "Lumina-mGPT: Illuminate Flexible Photorealistic Text-to-Image Generation with Multimodal Generative Pretrainiβ¦β600Updated 2 months ago
- [ICLR 2024 & ECCV 2024] The All-Seeing Projects: Towards Panoptic Visual Recognition&Understanding and General Relation Comprehension of β¦β486Updated 10 months ago
- β515Updated 6 months ago
- (CVPR2024)A benchmark for evaluating Multimodal LLMs using multiple-choice questions.β342Updated 5 months ago
- [CVPR 2024] Panda-70M: Captioning 70M Videos with Multiple Cross-Modality Teachersβ604Updated 7 months ago
- Better Aligning Text-to-Image Models with Human Preference. ICCV 2023β285Updated last year
- Large-scale text-video dataset. 10 million captioned short videos.β642Updated 10 months ago
- My implementation of "Patch nβ Pack: NaViT, a Vision Transformer for any Aspect Ratio and Resolution"β240Updated 2 months ago