harrytea / ROOTLinks
ROOT: VLM based System for Indoor Scene Understanding and Beyond
☆29Updated 5 months ago
Alternatives and similar repositories for ROOT
Users that are interested in ROOT are comparing it to the libraries listed below
Sorting:
- Sora Generates Videos with Stunning Geometrical Consistency☆50Updated last year
- Scaling Properties of Diffusion Models For Perceptual Tasks (CVPR 2025)☆39Updated last month
- Code for ICML 2025 Paper "Highly Compressed Tokenizer Can Generate Without Training"☆80Updated 2 weeks ago
- ☆34Updated last year
- VLM-3R: Vision-Language Models Augmented with Instruction-Aligned 3D Reconstruction☆179Updated this week
- Self-reimplemented version of 4D-LRM.☆30Updated 3 weeks ago
- The official repository for paper "MLLMs Need 3D-Aware Representation Supervision for Scene Understanding"☆58Updated 2 weeks ago
- Official Code for 'AR-1-to-3: Single Image to Consistent 3D Object Generation via Next-View Prediction'☆29Updated last month
- ☆39Updated last year
- [NeurIPS2024] DiffPano: Scalable and Consistent Text to Panorama Generation with Spherical Epipolar-Aware Diffusion☆37Updated 9 months ago
- [ICLR 2024] Official implementation of the paper "Toss: High-quality text-guided novel view synthesis from a single image"☆22Updated last year
- Official implementation of "Ross3D: Reconstructive Visual Instruction Tuning with 3D-Awareness".☆29Updated 2 weeks ago
- open-sourced video dataset with dynamic scenes and camera movements annotation☆61Updated 2 months ago
- [ECCV2024] Official Implementation of "NVS-Adapter: Plug-and-Play Novel View Synthesis from a Single Image"☆29Updated 6 months ago
- SpaceR: The first MLLM empowered by SG-RLVR for video spatial reasoning☆63Updated 2 weeks ago
- Reinforcing Spatial Reasoning in Vision-Language Models with Interwoven Thinking and Visual Drawing☆33Updated this week
- Curated list of recent visual autoregressive (VAR) modeling works☆29Updated 3 months ago
- Pytorch implementation of GaussianToken: An Effective Image Tokenizer with 2D Gaussian Splatting☆83Updated 2 months ago
- 4D Panoptic Scene Graph Generation (NeurIPS'23 Spotlight)☆109Updated 3 months ago
- Amodal Depth Anything: Amodal Depth Estimation in the Wild☆31Updated 5 months ago
- ☆62Updated last month
- Open-Vocabulary SAM3D: Understand Any 3D Scene☆30Updated 2 weeks ago
- Official code for "JAFAR: Jack up Any Feature at Any Resolution"☆124Updated last week
- ☆37Updated 2 weeks ago
- [ICLR 2025] Official code of "Segment any 3D Object with Language"☆49Updated this week
- [CVPR 25] A framework named B^2-DiffuRL for RL-based diffusion model fine-tuning.☆30Updated 2 months ago
- ☆47Updated last month
- A list of works on video generation towards world model☆151Updated this week
- ☆21Updated 3 months ago
- Official implementation of our paper "Flow-Anything: Learning Real-World Optical Flow Estimation from Large-Scale Single-view Images"☆44Updated 2 weeks ago