vision-x-nyu / pisa-experiments
Code release for "PISA Experiments: Exploring Physics Post-Training for Video Diffusion Models by Watching Stuff Drop" (arXiv 2025)
☆28Updated 3 weeks ago
Alternatives and similar repositories for pisa-experiments:
Users that are interested in pisa-experiments are comparing it to the libraries listed below
- Video Generation, Physical Commonsense, Semantic Adherence, VideoCon-Physics☆89Updated last month
- ☆126Updated 3 months ago
- Diffusion Powers Video Tokenizer for Comprehension and Generation (CVPR 2025)☆66Updated last month
- ☆28Updated 4 months ago
- A collection of vision foundation models unifying understanding and generation.☆49Updated 3 months ago
- A comprehensive list of papers investigating physical cognition in video generation, including papers, codes, and related websites.☆56Updated this week
- ImageGen-CoT: Enhancing Text-to-Image In-context Learning with Chain-of-Thought Reasoning☆26Updated last week
- Code for MetaMorph Multimodal Understanding and Generation via Instruction Tuning☆86Updated this week
- Official Pytorch implementation for LARP: Tokenizing Videos with a Learned Autoregressive Generative Prior (ICLR 2025 Oral).☆63Updated 2 months ago
- EgoVid-5M: A Large-Scale Video-Action Dataset for Egocentric Video Generation☆100Updated 5 months ago
- Official implementation for WorldScore: A Unified Evaluation Benchmark for World Generation☆86Updated last week
- The code and data of Paper: Towards World Simulator: Crafting Physical Commonsense-Based Benchmark for Video Generation☆99Updated 5 months ago
- [arXiv: 2502.05178] QLIP: Text-Aligned Visual Tokenization Unifies Auto-Regressive Multimodal Understanding and Generation☆68Updated last month
- WISE: A World Knowledge-Informed Semantic Evaluation for Text-to-Image Generation☆75Updated last week
- [CVPR 2025 (Oral)] Open implementation of "RandAR"☆99Updated 3 weeks ago
- Empowering Unified MLLM with Multi-granular Visual Generation☆119Updated 3 months ago
- ElasticTok: Adaptive Tokenization for Image and Video☆65Updated 5 months ago
- Official implementation of "Ross3D: Reconstructive Visual Instruction Tuning with 3D-Awareness".☆17Updated 2 weeks ago
- PyTorch implementation of DiffMoE, TC-DiT, EC-DiT and Dense DiT☆69Updated 3 weeks ago
- Official code for MotionBench (CVPR 2025)☆34Updated last month
- Official repo for "GigaTok: Scaling Visual Tokenizers to 3 Billion Parameters for Autoregressive Image Generation"☆107Updated this week
- official code repo of CVPR 2025 paper PhyT2V: LLM-Guided Iterative Self-Refinement for Physics-Grounded Text-to-Video Generation☆21Updated 3 weeks ago
- Spatial-R1: The first MLLM trained using GRPO for spatial reasoning in videos☆21Updated this week
- official repo for "VideoScore: Building Automatic Metrics to Simulate Fine-grained Human Feedback for Video Generation" [EMNLP2024]☆89Updated 2 months ago
- Frequency Autoregressive Image Generation with Continuous Tokens☆54Updated last month
- [ICLR 2024] Seer: Language Instructed Video Prediction with Latent Diffusion Models☆30Updated 10 months ago
- ☆47Updated 4 months ago
- [Neurips 2024] Video Diffusion Models are Training-free Motion Interpreter and Controller☆34Updated 2 months ago
- Official implementation of the paper: REPA-E: Unlocking VAE for End-to-End Tuning of Latent Diffusion Transformers☆49Updated this week
- [World-Model-Survey-2024] Paper list and projects for World Model☆9Updated 5 months ago