yangchris11 / samuraiLinks
Official repository of "SAMURAI: Adapting Segment Anything Model for Zero-Shot Visual Tracking with Motion-Aware Memory"
☆7,017Updated 8 months ago
Alternatives and similar repositories for samurai
Users that are interested in samurai are comparing it to the libraries listed below
Sorting:
- Depth Pro: Sharp Monocular Metric Depth in Less Than a Second.☆5,053Updated 7 months ago
- The repository provides code for running inference with the Meta Segment Anything Model 2 (SAM 2), links for downloading the trained mode…☆17,954Updated 11 months ago
- High-resolution models for human tasks.☆5,233Updated last year
- CoTracker is a model for tracking any point (pixel) on a video.☆4,695Updated 10 months ago
- SANA: Efficient High-Resolution Image Synthesis with Linear Diffusion Transformer☆4,773Updated this week
- streamline the fine-tuning process for multimodal models: PaliGemma 2, Florence-2, and Qwen2.5-VL☆2,642Updated last week
- A unified library for object tracking featuring clean room re-implementations of leading multi-object tracking algorithms☆2,184Updated last week
- YOLOE: Real-Time Seeing Anything [ICCV 2025]☆1,934Updated 5 months ago
- Reference PyTorch implementation and models for DINOv3☆8,731Updated 2 weeks ago
- [NeurIPS 2024] Depth Anything V2. A More Capable Foundation Model for Monocular Depth Estimation☆7,164Updated 10 months ago
- [NeurIPS 2025] SpatialLM: Training Large Language Models for Structured Indoor Modeling☆4,112Updated 2 months ago
- RF-DETR is a real-time object detection and segmentation model architecture developed by Roboflow, SOTA on COCO and designed for fine-tun…☆4,624Updated 3 weeks ago
- DIAMOND (DIffusion As a Model Of eNvironment Dreams) is a reinforcement learning agent trained in a diffusion world model. NeurIPS 2024 S…☆1,911Updated last year
- [CVPR 2025] Official PyTorch implementation of "EdgeTAM: On-Device Track Anything Model"☆827Updated last month
- Grounded SAM 2: Ground and Track Anything in Videos with Grounding DINO, Florence-2 and SAM 2☆3,088Updated last month
- [CVPR 2024] Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data. Foundation Model for Monocular Depth Estimation☆7,908Updated last year
- [CAAI AIR'24] Bilateral Reference for High-Resolution Dichotomous Image Segmentation☆2,969Updated last month
- The repository provides code for running inference and finetuning with the Meta Segment Anything Model 3 (SAM 3), links for downloading t…☆5,245Updated last week
- New repo collection for NVIDIA Cosmos: https://github.com/nvidia-cosmos☆8,056Updated 6 months ago
- [CVPR 2025 Best Paper Award] VGGT: Visual Geometry Grounded Transformer☆11,904Updated 2 months ago
- [CVPR 2024 - Oral, Best Paper Award Candidate] Marigold: Repurposing Diffusion-Based Image Generators for Monocular Depth Estimation☆3,015Updated 6 months ago
- tiny vision language model☆8,983Updated 3 weeks ago
- Turn any computer or edge device into a command center for your computer vision projects.☆2,090Updated this week
- Images to inference with no labeling (use foundation models to train supervised models).☆2,502Updated 6 months ago
- Gaze-LLE: Gaze Target Estimation via Large-Scale Learned Encoders (CVPR 2025, Highlight)☆800Updated 7 months ago
- [CVPR 2025] MASt3R-SLAM: Real-Time Dense SLAM with 3D Reconstruction Priors☆2,693Updated last month
- This repository contains the official implementation of "FastVLM: Efficient Vision Encoding for Vision Language Models" - CVPR 2025☆7,029Updated 7 months ago
- [ECCV2024] API code for T-Rex2: Towards Generic Object Detection via Text-Visual Prompt Synergy☆2,605Updated last month
- 4M: Massively Multimodal Masked Modeling☆1,776Updated 6 months ago
- Efficient Track Anything☆680Updated 11 months ago