IDEA-Research/Grounded-SAM-2

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/IDEA-Research/Grounded-SAM-2)

IDEA-Research / Grounded-SAM-2

Grounded SAM 2: Ground and Track Anything in Videos with Grounding DINO, Florence-2 and SAM 2

☆3,652

Alternatives and similar repositories for Grounded-SAM-2

Users that are interested in Grounded-SAM-2 are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

facebookresearch / sam2
View on GitHub
The repository provides code for running inference with the Meta Segment Anything Model 2 (SAM 2), links for downloading the trained mode…
☆19,577May 30, 2026Updated last month
IDEA-Research / GroundingDINO
View on GitHub
[ECCV 2024] Official implementation of the paper "Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection"
☆10,437Aug 12, 2024Updated last year
IDEA-Research / Grounded-Segment-Anything
View on GitHub
Grounded SAM: Marrying Grounding DINO with Segment Anything & Stable Diffusion & Recognize Anything - Automatically Detect , Segment and …
☆17,686Sep 5, 2024Updated last year
facebookresearch / sam3
View on GitHub
The repository provides code for running inference and finetuning with the Meta Segment Anything Model 3 (SAM 3), links for downloading t…
☆11,063Jul 15, 2026Updated last week
facebookresearch / dinov3
View on GitHub
Reference PyTorch implementation and models for DINOv3
☆10,993Jul 15, 2026Updated last week
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
IDEA-Research / Grounding-DINO-1.5-API
View on GitHub
Grounding DINO 1.5: IDEA Research's Most Capable Open-World Object Detection Model Series
☆1,139Jan 21, 2025Updated last year
IDEA-Research / DINO-X-API
View on GitHub
DINO-X: The World's Top-Performing Vision Model for Open-World Object Detection and Understanding
☆1,399Jul 23, 2025Updated last year
facebookresearch / dinov2
View on GitHub
PyTorch code and models for the DINOv2 self-supervised learning method.
☆13,150Jun 3, 2026Updated last month
UX-Decoder / Semantic-SAM
View on GitHub
[ECCV 2024] Official implementation of the paper "Semantic-SAM: Segment and Recognize Anything at Any Granularity"
☆2,853Jul 10, 2025Updated last year
facebookresearch / vggt
View on GitHub
[CVPR 2025 Best Paper Award] VGGT: Visual Geometry Grounded Transformer
☆13,968May 19, 2026Updated 2 months ago
DepthAnything / Depth-Anything-V2
View on GitHub
[NeurIPS 2024] Depth Anything V2. A More Capable Foundation Model for Monocular Depth Estimation
☆8,521Mar 24, 2026Updated 4 months ago
ByteDance-Seed / Depth-Anything-3
View on GitHub
Depth Anything 3
☆5,956Jul 15, 2026Updated last week
QwenLM / Qwen3-VL
View on GitHub
Qwen3-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.
☆19,650Jan 30, 2026Updated 5 months ago
facebookresearch / segment-anything
View on GitHub
The repository provides code for running inference with the SegmentAnything Model (SAM), links for downloading the trained model checkpoi…
☆54,590Sep 18, 2024Updated last year
GPUs on demand by Runpod - Special Offer Available • Ad
Run AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
microsoft / MoGe
View on GitHub
[CVPR'25 Oral] MoGe: Unlocking Accurate Monocular Geometry Estimation for Open-Domain Images with Optimal Training Supervision
☆2,672Updated this week
bytedance / Sa2VA
View on GitHub
Official Repo For Pixel-LLM Codebase: Sa2VA (Arxiv-25), SAMTok (CVPR-26), VRT, SaSaSa2VA (1-st solution for LSVOS)
☆1,650Jun 19, 2026Updated last month
NVlabs / FoundationPose
View on GitHub
[CVPR 2024 Highlight] FoundationPose: Unified 6D Pose Estimation and Tracking of Novel Objects
☆3,446Apr 29, 2026Updated 2 months ago
NVlabs / FoundationStereo
View on GitHub
[CVPR 2025 Best Paper Nomination] FoundationStereo: Zero-Shot Stereo Matching
☆2,834Dec 19, 2025Updated 7 months ago
facebookresearch / map-anything
View on GitHub
MapAnything: Universal Feed-Forward Metric 3D Reconstruction
☆3,585Jul 17, 2026Updated last week
facebookresearch / sam-3d-objects
View on GitHub
SAM 3D Objects
☆7,161Jun 2, 2026Updated last month
luca-medeiros / lang-segment-anything
View on GitHub
SAM with text prompt
☆2,593Aug 28, 2025Updated 10 months ago
yyfz / Pi3
View on GitHub
[ICLR 2026] π^3: Permutation-Equivariant Visual Geometry Learning
☆2,084Jul 3, 2026Updated 3 weeks ago
naver / mast3r
View on GitHub
Grounding Image Matching in 3D with MASt3R
☆3,051Jun 30, 2025Updated last year
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
NVlabs / RADIO
View on GitHub
Official repository for "AM-RADIO: Reduce All Domains Into One"
☆1,900May 29, 2026Updated last month
AILab-CVC / YOLO-World
View on GitHub
[CVPR 2024] Real-Time Open-Vocabulary Object Detection
☆6,478Feb 26, 2025Updated last year
nv-tlabs / vipe
View on GitHub
ViPE: Video Pose Engine for Geometric 3D Perception
☆2,049Jun 9, 2026Updated last month
LiheYoung / Depth-Anything
View on GitHub
[CVPR 2024] Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data. Foundation Model for Monocular Depth Estimation
☆8,167Jul 17, 2024Updated 2 years ago
facebookresearch / perception_models
View on GitHub
State-of-the-art Image & Video CLIP, Multimodal Large Language Models, and More!
☆2,330Apr 13, 2026Updated 3 months ago
xinyu1205 / recognize-anything
View on GitHub
Open-source and strong foundation image recognition models.
☆3,691Feb 18, 2025Updated last year
Gy920 / segment-anything-2-real-time
View on GitHub
Run Segment Anything Model 2 on a live video stream
☆592Jun 3, 2025Updated last year
mega-sam / mega-sam
View on GitHub
Code for the project "MegaSaM: Accurate, Fast and Robust Structure and Motion from Casual Dynamic Videos"
☆1,336Jan 5, 2026Updated 6 months ago
patrick-tssn / Streaming-Grounded-SAM-2
View on GitHub
Grounded Tracking for Streaming Videos
☆127Oct 10, 2024Updated last year
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
CUT3R / CUT3R
View on GitHub
Official implementation of Continuous 3D Perception Model with Persistent State
☆1,468Aug 27, 2025Updated 10 months ago
LLaVA-VL / LLaVA-NeXT
View on GitHub
☆4,712Jun 15, 2026Updated last month
UX-Decoder / Segment-Everything-Everywhere-All-At-Once
View on GitHub
[NeurIPS 2023] Official implementation of the paper "Segment Everything Everywhere All at Once"
☆4,795Aug 19, 2024Updated last year
THU-MIG / yoloe
View on GitHub
YOLOE: Real-Time Seeing Anything [ICCV 2025]
☆2,214Jun 26, 2025Updated last year
DepthAnything / Video-Depth-Anything
View on GitHub
[CVPR 2025 Highlight] Video Depth Anything: Consistent Depth Estimation for Super-Long Videos
☆2,005Oct 7, 2025Updated 9 months ago
openvla / openvla
View on GitHub
OpenVLA: An open-source vision-language-action model for robotic manipulation.
☆6,688Mar 23, 2025Updated last year
z-x-yang / Segment-and-Track-Anything
View on GitHub
An open-source project dedicated to tracking and segmenting any objects in videos, either automatically or interactively. The primary alg…
☆3,134Jul 3, 2026Updated 2 weeks ago