zhijie-group / Show-o-Turbo

☆33

Alternatives and similar repositories for Show-o-Turbo:

Users that are interested in Show-o-Turbo are comparing it to the libraries listed below

hp-l33 / ARPG
Autoregressive Image Generation with Randomized Parallel Decoding
☆53Updated last month
Pepper-lll / LMforImageGeneration
Codebase for the paper-Elucidating the design space of language models for image generation
☆45Updated 5 months ago
NVlabs / QLIP
[arXiv: 2502.05178] QLIP: Text-Aligned Visual Tokenization Unifies Auto-Regressive Multimodal Understanding and Generation
☆69Updated 2 months ago
RenShuhuai-Andy / NBP
Official implementation of Next Block Prediction: Video Generation via Semi-Autoregressive Modeling
☆31Updated 2 months ago
JiaqiLiao77 / ImageGen-CoT
ImageGen-CoT: Enhancing Text-to-Image In-context Learning with Chain-of-Thought Reasoning
☆31Updated last month
slowfast-vgen / slowfast-vgen
☆21Updated 6 months ago
OpenGVLab / PVC
[CVPR 2025] PVC: Progressive Visual Token Compression for Unified Image and Video Processing in Large Vision-Language Models
☆39Updated 2 months ago
TencentARC / SEED-Bench-R1
☆79Updated last month
Gen-Verse / HermesFlow
HermesFlow: Seamlessly Closing the Gap in Multimodal Understanding and Generation
☆57Updated 2 months ago
OpenGVLab / TPO
Task Preference Optimization: Improving Multimodal Large Language Models with Vision Task Alignment
☆50Updated 4 months ago
si0wang / VisVM
☆40Updated 4 months ago
Beckschen / LLaVolta
[NeurIPS 2024] Efficient Large Multi-modal Models via Visual Context Compression
☆55Updated 2 months ago
thu-ml / CCA
Codes accompanying the paper "Toward Guidance-Free AR Visual Generation via Condition Contrastive Alignment"
☆31Updated 2 months ago
Owen718 / LongPrompt-LLamaGen
This repository provides an improved LLamaGen Model, fine-tuned on 500,000 high-quality images, each accompanied by over 300 token prompt…
☆30Updated 6 months ago
OpenGVLab / Mono-InternVL
[CVPR 2025] Mono-InternVL: Pushing the Boundaries of Monolithic Multimodal Large Language Models with Endogenous Visual Pre-training
☆39Updated last month
daeunni / VideoRepair
Code for "VideoRepair: Improving Text-to-Video Generation via Misalignment Evaluation and Localized Refinement"
☆47Updated 5 months ago
LeapLabTHU / AdaNAT
[ECCV 2024] AdaNAT: Exploring Adaptive Policy for Token-Based Image Generation
☆33Updated 7 months ago
yfzhang114 / r1_reward
✨✨R1-Reward: Training Multimodal Reward Model Through Stable Reinforcement Learning
☆63Updated this week
MengLcool / DeepStack-VL
[NeurIPS-24] This is the official implementation of the paper "DeepStack: Deeply Stacking Visual Tokens is Surprisingly Simple and Effect…
☆35Updated 10 months ago
mengcye / LAION-SG
☆52Updated 2 weeks ago
hu-zijing / B2-DiffuRL
A framework named B^2-DiffuRL for RL-based diffusion model fine-tuning.
☆29Updated last month
Hon-Wong / ByteVideoLLM
This is the official repo for ByteVideoLLM/Dynamic-VLM
☆20Updated 4 months ago
zhouyiks / CoLVA
☆28Updated 4 months ago
EvolvingLMMs-Lab / VideoMMMU
☆40Updated last month
sterzhang / PVIT
Official Repository of Personalized Visual Instruct Tuning
☆28Updated 2 months ago
Mozhgan91 / LEO
LEO: A powerful Hybrid Multimodal LLM
☆18Updated 3 months ago
OpenGVLab / De-focus-Attention-Networks
Learning 1D Causal Visual Representation with De-focus Attention Networks
☆34Updated 11 months ago
weixi-feng / TC-Bench
☆22Updated 10 months ago
tyshiwo1 / Accelerating-T2I-AR-with-SJD
[ICLR 2025] Implementation of Accelerating Auto-regressive Text-to-Image Generation with Training-free Speculative Jacobi Decoding
☆37Updated 2 weeks ago
jiyt17 / IDA-VLM
[ICLR 2025] IDA-VLM: Towards Movie Understanding via ID-Aware Large Vision-Language Model
☆29Updated 5 months ago