harrytea / ROOT
ROOT: VLM based System for Indoor Scene Understanding and Beyond
☆25Updated 3 months ago
Alternatives and similar repositories for ROOT:
Users that are interested in ROOT are comparing it to the libraries listed below
- ☆22Updated 3 weeks ago
- ☆34Updated last year
- Sora Generates Videos with Stunning Geometrical Consistency☆49Updated last year
- Amodal Depth Anything: Amodal Depth Estimation in the Wild☆29Updated 3 months ago
- Scaling Properties of Diffusion Models For Perceptual Tasks☆38Updated 5 months ago
- Unifying 2D and 3D Vision-Language Understanding☆69Updated last week
- [CVPR 2025] Test-Time Visual In-Context Tuning☆23Updated 3 weeks ago
- FleVRS: Towards Flexible Visual Relationship Segmentation, NeurIPS 2024☆20Updated 4 months ago
- [ICLR 2024] Official implementation of the paper "Toss: High-quality text-guided novel view synthesis from a single image"☆22Updated 11 months ago
- Spatial-R1: The first MLLM trained using GRPO for spatial reasoning in videos☆22Updated last week
- 4D Panoptic Scene Graph Generation (NeurIPS'23 Spotlight)☆108Updated last month
- ☆14Updated last year
- A framework named B^2-DiffuRL for RL-based diffusion model fine-tuning.☆24Updated 3 weeks ago
- [NeurIPS 2024] Lexicon3D: Probing Visual Foundation Models for Complex 3D Scene Understanding☆87Updated 2 months ago
- [ECCV 2024] M3DBench introduces a comprehensive 3D instruction-following dataset with support for interleaved multi-modal prompts.☆60Updated 6 months ago
- Official implementation of "Ross3D: Reconstructive Visual Instruction Tuning with 3D-Awareness".☆18Updated 2 weeks ago
- Open-Vocabulary Panoptic Segmentation☆23Updated 7 months ago
- [CVPR2025] SegAgent: Exploring Pixel Understanding Capabilities in MLLMs by Imitating Human Annotator Trajectories☆35Updated last month
- ☆18Updated last month
- Official repo for "Streaming Video Understanding and Multi-round Interaction with Memory-enhanced Knowledge" ICLR2025☆45Updated last month
- Open-Vocabulary SAM3D: Understand Any 3D Scene☆27Updated 7 months ago
- ☆17Updated 3 weeks ago
- Cosmos-Transfer1-7B-Sample-AV Toolkits☆23Updated this week
- ☆28Updated 3 months ago
- Official Code for 'AR-1-to-3: Single Image to Consistent 3D Object via Next-View Prediction'☆26Updated last month
- [NeurIPS2024] DiffPano: Scalable and Consistent Text to Panorama Generation with Spherical Epipolar-Aware Diffusion☆35Updated 6 months ago
- Curated list of recent visual autoregressive (VAR) modeling works☆30Updated last month
- ☆50Updated last year
- [ICLR 2025] Official code of "Segment any 3D Object with Language"☆43Updated 2 months ago
- open-sourced video dataset with dynamic scenes and camera movements annotation☆48Updated last week