jy0205/LaVIT

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/jy0205/LaVIT)

jy0205 / LaVIT

LaVIT: Empower the Large Language Model to Understand and Generate Visual Content

☆603

Alternatives and similar repositories for LaVIT

Users that are interested in LaVIT are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

AILab-CVC / SEED
View on GitHub
Official implementation of SEED-LLaMA (ICLR 2024).
☆642Sep 21, 2024Updated last year
baaivision / Emu
View on GitHub
Emu Series: Generative Multimodal Models from BAAI
☆1,776Jan 12, 2026Updated 6 months ago
RunpeiDong / DreamLLM
View on GitHub
[ICLR 2024 Spotlight] DreamLLM: Synergistic Multimodal Comprehension and Creation
☆462Dec 2, 2024Updated last year
TencentARC / SEED-Voken
View on GitHub
SEED-Voken: A Series of Powerful Visual Tokenizers
☆1,020Nov 25, 2025Updated 8 months ago
FoundationVision / OmniTokenizer
View on GitHub
[NeurIPS 2024]OmniTokenizer: one model and one weight for image-video joint tokenization.
☆325Jul 9, 2024Updated 2 years ago
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
FoundationVision / LlamaGen
View on GitHub
Autoregressive Model Beats Diffusion: 🦙 Llama for Scalable Image Generation
☆1,960Aug 15, 2024Updated last year
lucidrains / magvit2-pytorch
View on GitHub
Implementation of MagViT2 Tokenizer in Pytorch
☆668Jan 12, 2025Updated last year
showlab / Show-o
View on GitHub
[ICLR & NeurIPS 2025] Repository for Show-o series, One Single Transformer to Unify Multimodal Understanding and Generation.
☆1,965Jan 8, 2026Updated 6 months ago
AILab-CVC / SEED-X
View on GitHub
Multimodal Models in Real World
☆558Feb 24, 2025Updated last year
snap-research / Panda-70M
View on GitHub
[CVPR 2024] Panda-70M: Captioning 70M Videos with Multiple Cross-Modality Teachers
☆700Oct 25, 2024Updated last year
google-research / magvit
View on GitHub
Official JAX implementation of MAGVIT: Masked Generative Video Transformer
☆1,002Jan 17, 2024Updated 2 years ago
bytedance / 1d-tokenizer
View on GitHub
This repo contains the code for 1D tokenizer and generator
☆1,168Mar 20, 2025Updated last year
mit-han-lab / vila-u
View on GitHub
[ICLR 2025] VILA-U: a Unified Foundation Model Integrating Visual Understanding and Generation
☆425Apr 25, 2025Updated last year
kohjingyu / gill
View on GitHub
🐟 Code and models for the NeurIPS 2023 paper "Generating Images with Multimodal Language Models".
☆470Jan 19, 2024Updated 2 years ago
Virtual machines for every use case on DigitalOcean • Ad
Get dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
baaivision / Emu3
View on GitHub
Next-Token Prediction is All You Need
☆2,432Jan 12, 2026Updated 6 months ago
Vchitect / Latte
View on GitHub
[TMLR 2025] Latte: Latent Diffusion Transformer for Video Generation.
☆1,948Oct 30, 2025Updated 8 months ago
shikras / shikra
View on GitHub
☆814Jul 8, 2024Updated 2 years ago
OpenGVLab / MM-Interleaved
View on GitHub
MM-Interleaved: Interleaved Image-Text Generative Modeling via Multi-modal Feature Synchronizer
☆255Apr 3, 2024Updated 2 years ago
UCSB-AI / MiniGPT-5
View on GitHub
Official implementation of paper "MiniGPT-5: Interleaved Vision-and-Language Generation via Generative Vokens"
☆867May 8, 2025Updated last year
baaivision / CapsFusion
View on GitHub
[CVPR 2024] CapsFusion: Rethinking Image-Text Data at Scale
☆215Feb 27, 2024Updated 2 years ago
Vchitect / LaVie
View on GitHub
[IJCV 2024] LaVie: High-Quality Video Generation with Cascaded Latent Diffusion Models
☆952Nov 13, 2024Updated last year
AILab-CVC / SEED-Bench
View on GitHub
(CVPR2024)A benchmark for evaluating Multimodal LLMs using multiple-choice questions.
☆366Jan 14, 2025Updated last year
Alpha-VLLM / Lumina-mGPT
View on GitHub
Official Implementation of "Lumina-mGPT: Illuminate Flexible Photorealistic Text-to-Image Generation with Multimodal Generative Pretraini…
☆646Oct 16, 2025Updated 9 months ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
EvolvingLMMs-Lab / LongVA
View on GitHub
Long Context Transfer from Language to Vision
☆407Mar 18, 2025Updated last year
NVIDIA / Cosmos-Tokenizer
View on GitHub
A suite of image and video neural tokenizers
☆1,732Feb 11, 2025Updated last year
LLaVA-VL / LLaVA-NeXT
View on GitHub
☆4,712Jun 15, 2026Updated last month
TencentARC / GVT
View on GitHub
Official code for "What Makes for Good Visual Tokenizers for Large Language Models?".
☆59Jun 27, 2023Updated 3 years ago
tsb0601 / MMVP
View on GitHub
☆365Jan 27, 2024Updated 2 years ago
NUS-HPC-AI-Lab / VideoSys
View on GitHub
VideoSys: An easy and efficient system for video generation
☆2,025Aug 27, 2025Updated 11 months ago
wenhaochai / MovieChat
View on GitHub
[CVPR 2024] MovieChat: From Dense Token to Sparse Memory for Long Video Understanding
☆706Jan 29, 2025Updated last year
allenai / unified-io-2
View on GitHub
☆650Feb 15, 2024Updated 2 years ago
OpenGVLab / all-seeing
View on GitHub
[ICLR 2024 & ECCV 2024] The All-Seeing Projects: Towards Panoptic Visual Recognition&Understanding and General Relation Comprehension of …
☆507Aug 9, 2024Updated last year
Serverless GPU API endpoints on Runpod - Get Bonus Credits • Ad
Skip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
jshilong / GPT4RoI
View on GitHub
(ECCVW 2025)GPT4RoI: Instruction Tuning Large Language Model on Region-of-Interest
☆556Jun 3, 2025Updated last year
magic-research / PLLaVA
View on GitHub
Official repository for the paper PLLaVA
☆669Jul 28, 2024Updated 2 years ago
HaozheZhao / MIC
View on GitHub
MMICL, a state-of-the-art VLM with the in context learning ability from ICL, PKU
☆361Dec 18, 2023Updated 2 years ago
OpenGVLab / VisionLLM
View on GitHub
VisionLLM Series
☆1,153Feb 27, 2025Updated last year
mira-space / MiraData
View on GitHub
Official repo for paper "MiraData: A Large-Scale Video Dataset with Long Durations and Structured Captions"
☆527Sep 2, 2024Updated last year
cambrian-mllm / cambrian
View on GitHub
Cambrian-1 is a family of multimodal LLMs with a vision-centric design.
☆2,011Nov 7, 2025Updated 8 months ago
Meituan-AutoML / VisionLLaMA
View on GitHub
VisionLLaMA: A Unified LLaMA Backbone for Vision Tasks
☆392Jul 9, 2024Updated 2 years ago