vision-x-nyu / pisa-experiments
Code release for "PISA Experiments: Exploring Physics Post-Training for Video Diffusion Models by Watching Stuff Drop" (arXiv 2025)
☆28Updated last month
Alternatives and similar repositories for pisa-experiments:
Users that are interested in pisa-experiments are comparing it to the libraries listed below
- A comprehensive list of papers investigating physical cognition in video generation, including papers, codes, and related websites.☆79Updated last week
- Video Generation, Physical Commonsense, Semantic Adherence, VideoCon-Physics☆93Updated this week
- Diffusion Powers Video Tokenizer for Comprehension and Generation (CVPR 2025)☆67Updated 2 months ago
- [arXiv: 2502.05178] QLIP: Text-Aligned Visual Tokenization Unifies Auto-Regressive Multimodal Understanding and Generation☆69Updated 2 months ago
- A list of works on video generation towards world model☆53Updated this week
- Official Pytorch implementation for LARP: Tokenizing Videos with a Learned Autoregressive Generative Prior (ICLR 2025 Oral).☆69Updated 2 months ago
- Official implementation for WorldScore: A Unified Evaluation Benchmark for World Generation☆96Updated 2 weeks ago
- [ICML2025] The code and data of Paper: Towards World Simulator: Crafting Physical Commonsense-Based Benchmark for Video Generation☆101Updated 6 months ago
- ☆126Updated 4 months ago
- EgoVid-5M: A Large-Scale Video-Action Dataset for Egocentric Video Generation☆103Updated 5 months ago
- Official implementation of "Ross3D: Reconstructive Visual Instruction Tuning with 3D-Awareness".☆20Updated last month
- [ArXiv 2025] WORLDMEM: Long-term Consistent World Simulation with Memory☆93Updated this week
- A framework named B^2-DiffuRL for RL-based diffusion model fine-tuning.☆29Updated last month
- [Neurips 2024] Video Diffusion Models are Training-free Motion Interpreter and Controller☆40Updated 3 weeks ago
- Codes accompanying the paper "Toward Guidance-Free AR Visual Generation via Condition Contrastive Alignment"☆31Updated 2 months ago
- ElasticTok: Adaptive Tokenization for Image and Video☆67Updated 6 months ago
- PyTorch implementation of DiffMoE, TC-DiT, EC-DiT and Dense DiT☆76Updated 2 weeks ago
- ☆29Updated 5 months ago
- A collection of vision foundation models unifying understanding and generation.☆55Updated 4 months ago
- WISE: A World Knowledge-Informed Semantic Evaluation for Text-to-Image Generation☆81Updated last month
- Official Implementation of VideoGen-of-Thought: Step-by-step generating multi-shot video with minimal manual intervention☆35Updated 2 weeks ago
- official repo for "VideoScore: Building Automatic Metrics to Simulate Fine-grained Human Feedback for Video Generation" [EMNLP2024]☆88Updated 2 months ago
- Spatial-R1: The first MLLM trained using GRPO for spatial reasoning in videos☆31Updated this week
- TokenBridge: Bridging Continuous and Discrete Tokens for Autoregressive Visual Generation. https://yuqingwang1029.github.io/TokenBridge☆106Updated last week
- FQGAN: Factorized Visual Tokenization and Generation☆50Updated last month
- Code Release of Harmonizing Visual Representations for Unified Multimodal Understanding and Generation☆82Updated 3 weeks ago
- Empowering Unified MLLM with Multi-granular Visual Generation☆119Updated 3 months ago
- [CVPR 2025] Science-T2I: Addressing Scientific Illusions in Image Synthesis☆51Updated last week
- Frequency Autoregressive Image Generation with Continuous Tokens☆59Updated last month
- [ICLR 2024] Seer: Language Instructed Video Prediction with Latent Diffusion Models☆31Updated 11 months ago