huang-yh / SpectralARLinks
[ICCV 25]SpectralAR: Spectral Autoregressive Visual Generation
☆26Updated last month
Alternatives and similar repositories for SpectralAR
Users that are interested in SpectralAR are comparing it to the libraries listed below
Sorting:
- Self-reimplemented version of 4D-LRM.☆47Updated last month
- Official implementation of "Ross3D: Reconstructive Visual Instruction Tuning with 3D-Awareness".☆44Updated 3 weeks ago
- A list of works on video generation towards world model☆157Updated last week
- Seeing from Another Perspective: Evaluating Multi-View Understanding in MLLMs☆43Updated 2 months ago
- GenDoP: Auto-regressive Camera Trajectory Generation as a Director of Photography☆66Updated this week
- UniFork: Exploring Modality Alignment for Unified Multimodal Understanding and Generation☆40Updated 2 weeks ago
- The official repository for paper "MLLMs Need 3D-Aware Representation Supervision for Scene Understanding"☆67Updated last month
- Diffusion Powers Video Tokenizer for Comprehension and Generation (CVPR 2025)☆73Updated 4 months ago
- SceneCompleter: Dense 3D Scene Completion for Generative Novel View Synthesis☆35Updated last month
- WideRange4D: Enabling High-Quality 4D Reconstruction with Wide-Range Movements and Scenes☆95Updated 4 months ago
- Official implementation of EPiC: Efficient Video Camera Control Learning with Precise Anchor-Video Guidance☆39Updated last month
- A collection of vision foundation models unifying understanding and generation.☆57Updated 6 months ago
- Ego-R1: Chain-of-Tool-Thought for Ultra-Long Egocentric Video Reasoning☆85Updated 3 weeks ago
- Official Implementation of Paper Transfer between Modalities with MetaQueries☆149Updated this week
- [CVPR 2025] UniPre3D: Unified Pre-training of 3D Point Cloud Models with Cross-Modal Gaussian Splatting☆37Updated 2 weeks ago
- SpaceR: The first MLLM empowered by SG-RLVR for video spatial reasoning☆69Updated last week
- Pytorch implementation of GaussianToken: An Effective Image Tokenizer with 2D Gaussian Splatting☆86Updated 3 months ago
- Multi-SpatialMLLM Multi-Frame Spatial Understanding with Multi-Modal Large Language Models☆135Updated last month
- [CVPR 25] A framework named B^2-DiffuRL for RL-based diffusion model fine-tuning.☆33Updated 3 months ago
- The official implementation of The paper "Exploring the Potential of Encoder-free Architectures in 3D LMMs"☆53Updated 2 months ago
- [AAAI 2025] GFlow: Recovering 4D World from Monocular Video☆46Updated 2 months ago
- SpatialScore: Towards Unified Evaluation for Multimodal Spatial Understanding☆53Updated last week
- EgoVid-5M: A Large-Scale Video-Action Dataset for Egocentric Video Generation☆109Updated 8 months ago
- [arXiv: 2502.05178] QLIP: Text-Aligned Visual Tokenization Unifies Auto-Regressive Multimodal Understanding and Generation☆76Updated 4 months ago
- [CVPR 2024] Customize your NeRF: Adaptive Source Driven 3D Scene Editing via Local-Global Iterative Training☆40Updated last year
- Code release for "PISA Experiments: Exploring Physics Post-Training for Video Diffusion Models by Watching Stuff Drop" (ICML 2025)☆38Updated 2 months ago
- Official implementation for WorldScore: A Unified Evaluation Benchmark for World Generation☆118Updated 2 weeks ago
- [CVPR 2025] GPS as a Control Signal for Image Generation☆20Updated 4 months ago
- Official Code for 'TAR3D: Creating High-Quality 3D Assets via Next-Part Prediction' (ICCV 2025)☆58Updated 6 months ago
- ☆51Updated last month