yangchris11 / samurai
Official repository of "SAMURAI: Adapting Segment Anything Model for Zero-Shot Visual Tracking with Motion-Aware Memory"
☆6,780Updated last month
Alternatives and similar repositories for samurai
Users that are interested in samurai are comparing it to the libraries listed below
Sorting:
- The repository provides code for running inference with the Meta Segment Anything Model 2 (SAM 2), links for downloading the trained mode…☆15,436Updated 4 months ago
- High-resolution models for human tasks.☆5,003Updated 5 months ago
- CoTracker is a model for tracking any point (pixel) on a video.☆4,300Updated 3 months ago
- [NeurIPS 2024] Depth Anything V2. A More Capable Foundation Model for Monocular Depth Estimation☆5,447Updated 3 months ago
- Depth Pro: Sharp Monocular Metric Depth in Less Than a Second.☆4,437Updated 3 weeks ago
- [CVPR 2024] Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data. Foundation Model for Monocular Depth Estimation☆7,508Updated 10 months ago
- Grounded SAM 2: Ground and Track Anything in Videos with Grounding DINO, Florence-2 and SAM 2☆2,111Updated last week
- YOLOE: Real-Time Seeing Anything☆1,227Updated 2 weeks ago
- [CVPR 2024 - Oral, Best Paper Award Candidate] Marigold: Repurposing Diffusion-Based Image Generators for Monocular Depth Estimation☆2,684Updated last month
- New repo collection for NVIDIA Cosmos: https://github.com/nvidia-cosmos☆7,952Updated 2 weeks ago
- Official repo for paper "Structured 3D Latents for Scalable and Versatile 3D Generation" (CVPR'25 Spotlight).☆9,451Updated this week
- HunyuanVideo: A Systematic Framework For Large Video Generation Model☆9,999Updated 2 weeks ago
- A unified library for object tracking featuring clean room re-implementations of leading multi-object tracking algorithms☆1,441Updated this week
- Official Implementation of CVPR24 highlight paper: Matching Anything by Segmenting Anything☆1,273Updated 2 weeks ago
- RF-DETR is a real-time object detection model architecture developed by Roboflow, SOTA on COCO & designed for fine-tuning.☆2,066Updated this week
- SANA: Efficient High-Resolution Image Synthesis with Linear Diffusion Transformer☆4,126Updated last week
- DINO-X: The World's Top-Performing Vision Model for Open-World Object Detection and Understanding☆1,033Updated 3 weeks ago
- Official repository for LTX-Video☆5,436Updated this week
- Official implementation of "MIMO: Controllable Character Video Synthesis with Spatial Decomposed Modeling"☆1,489Updated 4 months ago
- Qwen2.5-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.☆10,354Updated last week
- [ECCV 2024] Official implementation of the paper "Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection"☆8,022Updated 9 months ago
- YOLOv10: Real-Time End-to-End Object Detection [NeurIPS 2024]☆10,707Updated 2 months ago
- SAM with text prompt☆2,153Updated last week
- Run Segment Anything Model 2 on a live video stream☆387Updated 3 months ago
- [CVPR 2025 Highlight] DepthCrafter: Generating Consistent Long Depth Sequences for Open-world Videos☆1,301Updated 3 weeks ago
- D-FINE: Redefine Regression Task of DETRs as Fine-grained Distribution Refinement [ICLR 2025 Spotlight]☆2,246Updated last month
- Track-Anything is a flexible and interactive tool for video object tracking and segmentation, based on Segment Anything, XMem, and E2FGVI…☆6,702Updated 11 months ago
- Images to inference with no labeling (use foundation models to train supervised models).☆2,254Updated this week
- Fast and accurate automatic speech recognition (ASR) for edge devices☆2,701Updated this week
- Segment Anything in High Quality [NeurIPS 2023]☆3,930Updated 5 months ago