Qinyu-Allen-Zhao / DiSALinks
Official Implementation of Diffusion Step Annealing (DiSA) in Autoregressive Image Generation
☆129Updated last week
Alternatives and similar repositories for DiSA
Users that are interested in DiSA are comparing it to the libraries listed below
Sorting:
- The official repository for paper "MLLMs Need 3D-Aware Representation Supervision for Scene Understanding"☆30Updated this week
- A collection of vision foundation models unifying understanding and generation.☆55Updated 5 months ago
- Seeing from Another Perspective: Evaluating Multi-View Understanding in MLLMs☆38Updated 2 weeks ago
- Denoising Diffusion Step-aware Models (ICLR2024)☆61Updated last year
- “FlowAR: Scale-wise Autoregressive Image Generation Meets Flow Matching” FlowAR employs a simplest scale design and is compatible with an…☆123Updated last month
- [CVPR 2025 (Oral)] Open implementation of "RandAR"☆155Updated 2 months ago
- Official Pytorch implementation for LARP: Tokenizing Videos with a Learned Autoregressive Generative Prior (ICLR 2025 Oral).☆71Updated 3 months ago
- Code release for "PISA Experiments: Exploring Physics Post-Training for Video Diffusion Models by Watching Stuff Drop" (ICML 2025)☆34Updated 3 weeks ago
- Code Release of Harmonizing Visual Representations for Unified Multimodal Understanding and Generation☆117Updated 2 weeks ago
- A list of works on video generation towards world model☆113Updated this week
- ☆37Updated last week
- A comprehensive list of papers investigating physical cognition in video generation, including papers, codes, and related websites.☆110Updated this week
- Code release for NeurIPS 2023 paper SlotDiffusion: Object-centric Learning with Diffusion Models☆87Updated last year
- SpaceR: The first MLLM empowered by SG-RLVR for video spatial reasoning☆56Updated this week
- A Chrome/Edge extension to help you quickly scan through the flood of daily ArXiv papers.☆14Updated 2 months ago
- Official implementation of "Ross3D: Reconstructive Visual Instruction Tuning with 3D-Awareness".☆25Updated 2 months ago
- Code for D-DiT☆33Updated 2 months ago
- Video Generation, Physical Commonsense, Semantic Adherence, VideoCon-Physics☆106Updated last month
- ReNeg: Learning Negative Embedding with Reward Guidance☆32Updated 5 months ago
- ☆68Updated this week
- WISE: A World Knowledge-Informed Semantic Evaluation for Text-to-Image Generation☆105Updated this week
- Official Implementation of VideoDPO☆105Updated this week
- [arXiv: 2502.05178] QLIP: Text-Aligned Visual Tokenization Unifies Auto-Regressive Multimodal Understanding and Generation☆74Updated 3 months ago
- [ICLR 2024] Seer: Language Instructed Video Prediction with Latent Diffusion Models☆33Updated last year
- Code for MetaMorph Multimodal Understanding and Generation via Instruction Tuning☆178Updated last month
- GoT-R1: Unleashing Reasoning Capability of MLLM for Visual Generation with Reinforcement Learning☆70Updated last week
- FQGAN: Factorized Visual Tokenization and Generation☆50Updated 2 months ago
- ☆44Updated last month
- [CVPR 2025] Official PyTorch Implementation of GLUS: Global-Local Reasoning Unified into A Single Large Language Model for Video Segmenta…☆39Updated last month
- [ICLR'25] Reconstructive Visual Instruction Tuning☆89Updated last month