MCG-NJU / SAM2-PlusLinks
SAM 2++: Tracking Anything at Any Granularity
☆51Updated 2 weeks ago
Alternatives and similar repositories for SAM2-Plus
Users that are interested in SAM2-Plus are comparing it to the libraries listed below
Sorting:
- [EMNLP 2025 Findings] 3D-Aware Vision-Language Models Fine-Tuning with Geometric Distillation☆29Updated 6 months ago
- (ICCV 2025) ReferDINO: Referring Video Object Segmentation with Visual Grounding Foundations☆125Updated last month
- Official implementation of “4D LangVGGT: 4D Language-Visual Geometry Grounded Transformer”☆67Updated 3 weeks ago
- [NeurIPS 2025]"DynamicVerse: A Physically-Aware Multimodal Framework for 4D World Modeling"☆88Updated last week
- [Arxiv'25] MGVQ: Could VQ-VAE Beat VAE? A Generalizable Tokenizer with Multi-group Quantization☆53Updated 3 months ago
- Official implementation of DepthLM☆276Updated 2 months ago
- ☆26Updated 9 months ago
- [AAAI 2025] GFlow: Recovering 4D World from Monocular Video☆61Updated 7 months ago
- [ICLR 2025] Where Am I and What Will I See : An Auto-Regressive Model for Spatial Localization and View Prediction☆42Updated 4 months ago
- Pytorch implementation of GaussianToken: An Effective Image Tokenizer with 2D Gaussian Splatting☆101Updated 8 months ago
- Implementation of Zero-Shot Video Semantic Segmentation [CVPR 2025]☆55Updated 10 months ago
- Multi-SpatialMLLM Multi-Frame Spatial Understanding with Multi-Modal Large Language Models☆164Updated 2 months ago
- Official implementation of "Seurat: From Moving Points to Depth", CVPR 2025 Highlight☆67Updated 8 months ago
- ☆121Updated 6 months ago
- 4D Panoptic Scene Graph Generation (NeurIPS'23 Spotlight)☆116Updated 9 months ago
- OmniSpatial: Towards Comprehensive Spatial Reasoning Benchmark for Vision Language Models☆76Updated 3 months ago
- ☆36Updated 7 months ago
- [NeurIPS 2025] Official code for JAFAR: Jack up Any Feature at Any Resolution☆212Updated last month
- Paper: UniGS: Unified Language-Image-3D Pretraining with Gaussian Splatting☆29Updated 6 months ago
- Official implementation of "Depth Any Panoramas: A Foundation Model for Panoramic Depth Estimation".☆135Updated last week
- [NeurIPS 2025] Official Implementation of DINO-Foresight: Looking into the Future with DINO☆137Updated last month
- [NeurIPS 2025] LangSplatV2: High-dimensional 3D Language Gaussian Splatting with 450+ FPS☆165Updated 2 months ago
- (3DV 2026 Oral) L4P -- a feed-forward foundational model designed for multiple low-level 4D vision perception tasks.☆49Updated 3 weeks ago
- Scene-Centric Unsupervised Panoptic Segmentation (CVPR 2025 Highlight)☆77Updated 3 months ago
- ☆26Updated 9 months ago
- Official implementation of "Exploring Temporally-Aware Features for Point Tracking" (CVPR 2025)☆104Updated 8 months ago
- [CVPR'2025] EntitySAM: Segment Everything in Video☆58Updated 5 months ago
- [ICLR 2025] MVTokenFlow: High-quality 4D Content Generation using Multiview Token Flow☆24Updated 8 months ago
- Pixio: a capable vision encoder dedicated to dense prediction, simply by pixel reconstruction☆255Updated last week
- Seeing World Dynamics in a Nutshell☆111Updated 9 months ago