XingruiWang / Spatial457Links
[CVPR'25] A vision question answering (VQA) benchmark for 6D spatial reasoning.
☆10Updated last month
Alternatives and similar repositories for Spatial457
Users that are interested in Spatial457 are comparing it to the libraries listed below
Sorting:
- Official Implementation of VideoDPO☆125Updated last month
- ☆49Updated 2 months ago
- Official Implementation of Paper Transfer between Modalities with MetaQueries☆149Updated this week
- A comprehensive list of papers investigating physical cognition in video generation, including papers, codes, and related websites.☆137Updated last week
- Adaptive Length Image Tokenization via Recurrent Allocation | How many tokens is an image worth ?☆126Updated 5 months ago
- Code for the paper "GenHowTo: Learning to Generate Actions and State Transformations from Instructional Videos" published at CVPR 2024☆52Updated last year
- [ECCV2024, Oral, Best Paper Finalist] This is the official implementation of the paper "LEGO: Learning EGOcentric Action Frame Generation…☆37Updated 4 months ago
- Comparison between Frechet Video Distance implementation from StyleGAN-V and the original paper☆107Updated 6 months ago
- official repo for "VideoScore: Building Automatic Metrics to Simulate Fine-grained Human Feedback for Video Generation" [EMNLP2024]☆92Updated 5 months ago
- Official Implementation of Diffusion Step Annealing (DiSA) in Autoregressive Image Generation☆138Updated last month
- Video Generation, Physical Commonsense, Semantic Adherence, VideoCon-Physics☆128Updated 2 months ago
- [ICML2025] The code and data of Paper: Towards World Simulator: Crafting Physical Commonsense-Based Benchmark for Video Generation☆114Updated 8 months ago
- [CVPR 2024] On the Content Bias in Fréchet Video Distance☆117Updated 9 months ago
- Code for [CVPR 2025] ROICtrl: Boosting Instance Control for Visual Generation☆109Updated 3 months ago
- ☆47Updated 5 months ago
- Code release for NeurIPS 2023 paper SlotDiffusion: Object-centric Learning with Diffusion Models☆90Updated last year
- FQGAN: Factorized Visual Tokenization and Generation☆51Updated 3 months ago
- Official Implementation of ICLR'24: Kosmos-G: Generating Images in Context with Multimodal Large Language Models☆73Updated last year
- [CVPR 2025 (Oral)] Open implementation of "RandAR"☆178Updated this week
- (NeurIPS 2024 Spotlight) TOPA: Extend Large Language Models for Video Understanding via Text-Only Pre-Alignment☆31Updated 9 months ago
- Official code for paper: Auto Cherry-Picker: Learning from High-quality Generative Data Driven by Language☆26Updated 4 months ago
- ICCV2023-Diffusion-Papers☆108Updated last year
- Code for: "Long-Context Autoregressive Video Modeling with Next-Frame Prediction"☆230Updated 2 months ago
- ICML 2025 - Impossible Videos☆68Updated last month
- A list of works on video generation towards world model☆157Updated this week
- [CVPR 2025] MG-MotionLLM: A Unified Framework for Motion Comprehension and Generation across Multiple Granularities☆20Updated last week
- Benchmarking and Analyzing Generative Data for Visual Recognition☆26Updated last year
- [NeurIPS 2024 D&B Track] Official Repo for "LVD-2M: A Long-take Video Dataset with Temporally Dense Captions"☆63Updated 9 months ago
- [ICCV2025]Code Release of Harmonizing Visual Representations for Unified Multimodal Understanding and Generation☆141Updated last month
- The official implementation of Diffusion-KTO: Aligning Diffusion Models by Optimizing Human Utility☆55Updated 5 months ago