ROOT: VLM based System for Indoor Scene Understanding and Beyond
☆40Jan 22, 2025Updated last year
Alternatives and similar repositories for ROOT
Users that are interested in ROOT are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Neuroscience Inspired Agent Reasoning Framework☆29May 19, 2025Updated 10 months ago
- Official PyTorch implementation for TCSVT 23 "Detect Any Shadow: Segment Anything for Video Shadow Detection"☆65Nov 28, 2024Updated last year
- Introduce a novel Video Trimming (VT) task and proposes an agent-based approach (AVT) for detecting wasted footage, selecting valuable se…☆23Jan 20, 2025Updated last year
- Extreme Rotation Estimation using Dense Correlation Volumes☆44Jan 10, 2023Updated 3 years ago
- Embodied Instruction Following in Unknown Environments☆17Dec 8, 2025Updated 3 months ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click and start building anything your business needs.
- SuperGS: Super-Resolution 3D Gaussian Splatting Enhanced by Variational Residual Features and Uncertainty-Augmented Learning☆11May 24, 2025Updated 10 months ago
- Code for [AAAI 2026] AffordDex: Towards Affordance-Aware Robotic Dexterous Grasping with Human-like Priors☆28Dec 26, 2025Updated 3 months ago
- "Towards Improving Document Understanding: An Exploration on Text-Grounding via MLLMs" 2023☆16Nov 28, 2024Updated last year
- Manipulating semantic data within Python☆18Jan 14, 2025Updated last year
- Collections of object goal navigation papers in recent top-tier conferences.☆14Sep 24, 2022Updated 3 years ago
- ReSemAct: Advancing Fine-Grained Robotic Manipulation via Semantic Structuring and Affordance Refinement☆17Jan 5, 2026Updated 2 months ago
- LVAS-Agent Code Base☆22Apr 15, 2025Updated 11 months ago
- [WACV 2025] Enhancing Scene Graph Generation with Hierarchical Relationships and Commonsense Knowledge☆40Oct 29, 2024Updated last year
- [EMNLP 2025] The official implementation of "Zero-shot Multimodal Document Retrieval via Cross-Modal Question Generation"☆15Aug 26, 2025Updated 6 months ago
- End-to-end encrypted email - Proton Mail • AdSpecial offer: 40% Off Yearly / 80% Off First Month. All Proton services are open source and independently audited for security.
- Official implementation of the ECCV 2022 paper "CMD: Self-supervised 3D Action Representation Learning with Cross-modal Mutual Distillati…☆37Oct 5, 2022Updated 3 years ago
- Corpus to accompany: "Selective Vision is the Challenge for Visual Reasoning: A Benchmark for Visual Argument Understanding"☆11Apr 11, 2025Updated 11 months ago
- CVPR2025: Benchmarking Large Vision-Language Models via Directed Scene Graph for Comprehensive Image Captioning☆38Mar 21, 2025Updated last year
- Code for RA-L paper "One-shot Learning for Task-oriented Grasping"☆12May 9, 2024Updated last year
- ☆14Mar 23, 2024Updated 2 years ago
- ECCV2020_Spatial Hierarchy Aware Residual Pyramid Network for Time-of-Flight Depth Denoising☆12Sep 24, 2020Updated 5 years ago
- ☆27Jun 28, 2022Updated 3 years ago
- [RA-L] DRAGON: A Dialogue-Based Robot for Assistive Navigation with Visual Language Grounding☆18Apr 17, 2024Updated last year
- Using Segment-Anything and CLIP to generate pixel-aligned semantic features.☆40Apr 27, 2023Updated 2 years ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click and start building anything your business needs.
- [IEEE RA-L & ICRA 2026] Semantic-Driven Voxel Representation for LiDAR–Inertial Odometry☆41Nov 20, 2025Updated 4 months ago
- This repository implements computer vision for real-time chessboard detection and piece recognition. Using OpenCV and Numpy, the system p…☆13Sep 24, 2024Updated last year
- FSD Tesla Open-source. Real-Time Environment Reconstruction System for Autonomous Vehicles☆17Jan 8, 2026Updated 2 months ago
- CAR: Class-aware Regularizations for Semantic Segmentation (ECCV-2022)☆30Oct 26, 2022Updated 3 years ago
- Official repository for gathering data of Revisit Human-Scene Interaction via Space Occupancy (ECCV 2024).☆28Sep 29, 2024Updated last year
- The official repository for the paper "Statler: State-Maintaining Language Models for Embodied Reasoning"☆13Jun 10, 2024Updated last year
- Code for paper: "Few-Shot In-Context Imitation Learning via Implicit Graph Alignment"☆22Apr 5, 2024Updated last year
- Semantic-Geometric-Physical-Driven Robot Manipulation Skill Transfer via Skill Library and Tactile Representation☆14Jan 31, 2026Updated last month
- Joint magnitude estimation and phase recovery using Cycle-in-Cycle GAN for non-parallel speech enhancement☆10Jan 24, 2022Updated 4 years ago
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- ☆11Aug 29, 2025Updated 6 months ago
- Pixel-ImageNet☆45Feb 24, 2022Updated 4 years ago
- EKF, EIF and SEIF for SLAM☆10Nov 16, 2018Updated 7 years ago
- Implementation of the Paper Scene-Graph ViT☆10Dec 20, 2024Updated last year
- Hands-On Tutorial on Building Multimodal RAG Systems☆13Apr 10, 2025Updated 11 months ago
- Archives for Triton Inference Server Practices☆15Feb 28, 2022Updated 4 years ago
- ECCV 2022: Learning Shadow Correspondence for Video Shadow Detection☆13Jul 18, 2022Updated 3 years ago