yangchris11 / samuraiLinks
Official repository of "SAMURAI: Adapting Segment Anything Model for Zero-Shot Visual Tracking with Motion-Aware Memory"
☆6,847Updated 3 months ago
Alternatives and similar repositories for samurai
Users that are interested in samurai are comparing it to the libraries listed below
Sorting:
- The repository provides code for running inference with the Meta Segment Anything Model 2 (SAM 2), links for downloading the trained mode…☆15,908Updated 6 months ago
- Grounded SAM 2: Ground and Track Anything in Videos with Grounding DINO, Florence-2 and SAM 2☆2,347Updated last month
- A unified library for object tracking featuring clean room re-implementations of leading multi-object tracking algorithms☆1,774Updated this week
- Depth Pro: Sharp Monocular Metric Depth in Less Than a Second.☆4,576Updated 2 months ago
- CoTracker is a model for tracking any point (pixel) on a video.☆4,391Updated 5 months ago
- [NeurIPS 2024] Depth Anything V2. A More Capable Foundation Model for Monocular Depth Estimation☆5,840Updated 5 months ago
- Official repo for paper "Structured 3D Latents for Scalable and Versatile 3D Generation" (CVPR'25 Spotlight).☆9,889Updated 3 weeks ago
- This repository contains the official implementation of "FastVLM: Efficient Vision Encoding for Vision Language Models" - CVPR 2025☆4,233Updated last month
- SANA: Efficient High-Resolution Image Synthesis with Linear Diffusion Transformer☆4,300Updated 3 weeks ago
- RF-DETR is a real-time object detection model architecture developed by Roboflow, SOTA on COCO & designed for fine-tuning.☆2,269Updated 2 weeks ago
- streamline the fine-tuning process for multimodal models: PaliGemma 2, Florence-2, and Qwen2.5-VL☆2,578Updated this week
- High-resolution models for human tasks.☆5,053Updated 7 months ago
- SpatialLM: Training Large Language Models for Structured Indoor Modeling☆3,395Updated 2 weeks ago
- Official Implementation of CVPR24 highlight paper: Matching Anything by Segmenting Anything☆1,305Updated last month
- VILA is a family of state-of-the-art vision language models (VLMs) for diverse multimodal AI tasks across the edge, data center, and clou…☆3,344Updated last week
- [CVPR 2025 Highlight] Video Depth Anything: Consistent Depth Estimation for Super-Long Videos☆1,097Updated this week
- Gaze-LLE: Gaze Target Estimation via Large-Scale Learned Encoders (CVPR 2025, Highlight)☆719Updated 2 months ago
- YOLOE: Real-Time Seeing Anything☆1,364Updated last month
- New repo collection for NVIDIA Cosmos: https://github.com/nvidia-cosmos☆8,026Updated 2 weeks ago
- DINO-X: The World's Top-Performing Vision Model for Open-World Object Detection and Understanding☆1,096Updated last week
- Official repository for LTX-Video☆6,745Updated last month
- [CVPR 2024] Real-Time Open-Vocabulary Object Detection☆5,620Updated 4 months ago
- [ECCV 2024] Official implementation of the paper "Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection"☆8,289Updated 10 months ago
- Images to inference with no labeling (use foundation models to train supervised models).☆2,300Updated last month
- StarVector is a foundation model for SVG generation that transforms vectorization into a code generation task. Using a vision-language mo…☆3,912Updated 2 months ago
- Wan: Open and Advanced Large-Scale Video Generative Models☆12,404Updated 2 weeks ago
- [CVPR 2025 Best Paper Award] VGGT: Visual Geometry Grounded Transformer☆8,837Updated this week
- A general fine-tuning kit geared toward diffusion models.☆2,386Updated last week
- ☆3,030Updated 3 months ago
- Implementation for Describe Anything: Detailed Localized Image and Video Captioning☆1,179Updated last month