patrick-tssn / Streaming-Grounded-SAM-2
Grounded Tracking for Streaming Videos
☆46Updated last month
Related projects ⓘ
Alternatives and complementary repositories for Streaming-Grounded-SAM-2
- Run Segment Anything Model 2 on a live video stream☆172Updated last month
- ☆142Updated 4 months ago
- [ECCV 2024] Official implementation of the paper "TAPTR: Tracking Any Point with Transformers as Detection"☆199Updated 2 months ago
- ☆210Updated 4 months ago
- A Simple yet Effective Pathway to Empowering LLaVA to Understand and Interact with 3D World☆163Updated 3 weeks ago
- 4D Panoptic Scene Graph Generation (NeurIPS'23 Spotlight)☆85Updated 5 months ago
- [NeurIPS 2024] Lexicon3D: Probing Visual Foundation Models for Complex 3D Scene Understanding☆44Updated last week
- Muggled SAM: Segmentation without the magic☆54Updated this week
- Official implementation of ECCV24 paper "SceneVerse: Scaling 3D Vision-Language Learning for Grounded Scene Understanding"☆194Updated 2 weeks ago
- [CoRL 2024] VLM-Grounder: A VLM Agent for Zero-Shot 3D Visual Grounding☆61Updated 2 weeks ago
- Code for the paper: "ODIN: A Single Model for 2D and 3D Segmentation" (CVPR 2024)☆125Updated this week
- [ECCV 2024] Improving 2D Feature Representations by 3D-Aware Fine-Tuning☆229Updated 2 weeks ago
- Official Implementation of paper "Telling Left from Right: Identifying Geometry-Aware Semantic Correspondence"☆95Updated this week
- Theia: Distilling Diverse Vision Foundation Models for Robot Learning☆160Updated last month
- The official repo for "SpatialBot: Precise Spatial Understanding with Vision Language Models.☆159Updated 3 weeks ago
- ☆48Updated 2 months ago
- [ECCV 2024] ManiGaussian: Dynamic Gaussian Splatting for Multi-task Robotic Manipulation☆174Updated this week
- Code&Data for Grounded 3D-LLM with Referent Tokens☆89Updated last month
- This is the official repository for OVIR-3D: Open-Vocabulary 3D Instance Retrieval Without Training on 3D Data. (CoRL'23)☆93Updated last year
- Code for MultiPLY: A Multisensory Object-Centric Embodied Large Language Model in 3D World☆122Updated 2 weeks ago
- [ECCV 2024] Decomposition Betters Tracking Everything Everywhere☆111Updated 4 months ago
- Grounded-SAM: Marrying Grounding DINO with Segment Anything & Stable Diffusion & Recognize Anything - Automatically Detect , Segment and …☆38Updated last year
- SceneFun3D ToolKit☆77Updated 3 weeks ago
- [ECCV 2024] ShapeLLM: Universal 3D Object Understanding for Embodied Interaction☆141Updated last month
- Benchmarking Panoptic Video Scene Graph Generation (PVSG), CVPR'23☆79Updated 6 months ago
- Official implementation of ICCV 2023 paper "3D-VisTA: Pre-trained Transformer for 3D Vision and Text Alignment"☆190Updated last year
- ☆92Updated last year
- Official Code for Tracking Any Object Amodally☆113Updated 4 months ago
- [CVPR 2024] PyTorch implementation of NOPE: Novel Object Pose Estimation from a Single Image☆182Updated 7 months ago