Qinyu-Allen-Zhao / DiSALinks
Official Implementation of Diffusion Step Annealing (DiSA) in Autoregressive Image Generation
☆142Updated 4 months ago
Alternatives and similar repositories for DiSA
Users that are interested in DiSA are comparing it to the libraries listed below
Sorting:
- Official Implementation of Paper Transfer between Modalities with MetaQueries☆249Updated last week
- [NeurIPS 2025] MLLMs Need 3D-Aware Representation Supervision for Scene Understanding☆108Updated 3 weeks ago
- Offical repo for ICCV25 Highlight Paper: "ObjectRelator: Enabling Cross-View Object Relation Understanding in Ego-Centric and Exo-Centric…☆51Updated last week
- [CVPR 2025 (Oral)] Open implementation of "RandAR"☆198Updated 3 months ago
- [ICCV'25] Ross3D: Reconstructive Visual Instruction Tuning with 3D-Awareness☆57Updated 2 months ago
- [ICLR'25] Reconstructive Visual Instruction Tuning☆120Updated 6 months ago
- ☆39Updated 4 months ago
- Seeing from Another Perspective: Evaluating Multi-View Understanding in MLLMs☆49Updated 2 months ago
- A Chrome/Edge extension to help you quickly scan through the flood of daily ArXiv papers.☆15Updated 6 months ago
- A comprehensive list of papers investigating physical cognition in video generation, including papers, codes, and related websites.☆183Updated last week
- [CVPR 2025] Official PyTorch Implementation of GLUS: Global-Local Reasoning Unified into A Single Large Language Model for Video Segmenta…☆53Updated 3 months ago
- Ego-R1: Chain-of-Tool-Thought for Ultra-Long Egocentric Video Reasoning☆122Updated last month
- official training and inference code of bitwise tokenizer☆46Updated 5 months ago
- Code release for "PISA Experiments: Exploring Physics Post-Training for Video Diffusion Models by Watching Stuff Drop" (ICML 2025)☆41Updated 5 months ago
- [ICML2025] The code and data of Paper: Towards World Simulator: Crafting Physical Commonsense-Based Benchmark for Video Generation☆127Updated 11 months ago
- Official PyTorch Implementation of "Latent Denoising Makes Good Visual Tokenizers"☆135Updated 2 months ago
- Code release for NeurIPS 2023 paper SlotDiffusion: Object-centric Learning with Diffusion Models☆92Updated last year
- [ICCV2025]Code Release of Harmonizing Visual Representations for Unified Multimodal Understanding and Generation☆174Updated 4 months ago
- Selftok: Discrete Visual Tokens of Autoregression, by Diffusion, and for Reasoning☆224Updated 4 months ago
- Official Pytorch implementation for LARP: Tokenizing Videos with a Learned Autoregressive Generative Prior (ICLR 2025 Oral).☆92Updated 8 months ago
- A list of works on video generation towards world model☆167Updated 2 months ago
- This repository provides the official implementation of VTBench, a benchmark designed to evaluate the performance of visual tokenizers (V…☆34Updated 2 months ago
- ReNeg: Learning Negative Embedding with Reward Guidance☆35Updated 9 months ago
- Video-Panda: Parameter-efficient Alignment for Encoder-free Video-Language Models [CVPR 2025]☆73Updated 3 months ago
- [CVPR’25] PIVRG & ConsMTL☆16Updated 4 months ago
- [NeurIPS 2025] VideoREPA: Learning Physics for Video Generation through Relational Alignment with Foundation Models☆80Updated this week
- (ECCV 2024) Official repository of paper "EgoExo-Fitness: Towards Egocentric and Exocentric Full-Body Action Understanding"☆30Updated 6 months ago
- Official implementation of the CVPR'24 paper [Adaptive Slot Attention: Object Discovery with Dynamic Slot Number]☆57Updated 8 months ago
- ☆28Updated 7 months ago
- A collection of vision foundation models unifying understanding and generation.☆56Updated 9 months ago