yangchris11 / samuraiLinks
Official repository of "SAMURAI: Adapting Segment Anything Model for Zero-Shot Visual Tracking with Motion-Aware Memory"
☆6,829Updated 2 months ago
Alternatives and similar repositories for samurai
Users that are interested in samurai are comparing it to the libraries listed below
Sorting:
- High-resolution models for human tasks.☆5,031Updated 6 months ago
- The repository provides code for running inference with the Meta Segment Anything Model 2 (SAM 2), links for downloading the trained mode…☆15,641Updated 5 months ago
- Official repository for LTX-Video☆6,491Updated last week
- [NeurIPS 2024] Depth Anything V2. A More Capable Foundation Model for Monocular Depth Estimation☆5,635Updated 4 months ago
- Depth Pro: Sharp Monocular Metric Depth in Less Than a Second.☆4,496Updated last month
- CoTracker is a model for tracking any point (pixel) on a video.☆4,328Updated 4 months ago
- SpatialLM: Large Language Model for Spatial Understanding☆3,229Updated 2 months ago
- [CVPR 2024] Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data. Foundation Model for Monocular Depth Estimation☆7,566Updated 10 months ago
- Grounded SAM 2: Ground and Track Anything in Videos with Grounding DINO, Florence-2 and SAM 2☆2,231Updated last week
- RF-DETR is a real-time object detection model architecture developed by Roboflow, SOTA on COCO & designed for fine-tuning.☆2,196Updated this week
- YOLOE: Real-Time Seeing Anything☆1,325Updated last month
- SANA: Efficient High-Resolution Image Synthesis with Linear Diffusion Transformer☆4,225Updated 2 weeks ago
- A generative world for general-purpose robotics & embodied AI learning.☆25,217Updated this week
- DIAMOND (DIffusion As a Model Of eNvironment Dreams) is a reinforcement learning agent trained in a diffusion world model. NeurIPS 2024 S…☆1,818Updated 6 months ago
- Official Implementation of CVPR24 highlight paper: Matching Anything by Segmenting Anything☆1,297Updated last month
- [CVPR 2025 Best Paper Award Candidate] VGGT: Visual Geometry Grounded Transformer☆7,270Updated this week
- ☆2,994Updated 2 months ago
- New repo collection for NVIDIA Cosmos: https://github.com/nvidia-cosmos☆7,999Updated last month
- This repository contains the official implementation of "FastVLM: Efficient Vision Encoding for Vision Language Models" - CVPR 2025☆4,051Updated last month
- A unified library for object tracking featuring clean room re-implementations of leading multi-object tracking algorithms☆1,743Updated this week
- Gaze-LLE: Gaze Target Estimation via Large-Scale Learned Encoders (CVPR 2025, Highlight)☆582Updated last month
- Enjoy the magic of Diffusion models!☆8,736Updated 2 weeks ago
- [CVPR 2024 - Oral, Best Paper Award Candidate] Marigold: Repurposing Diffusion-Based Image Generators for Monocular Depth Estimation☆2,787Updated 3 weeks ago
- StarVector is a foundation model for SVG generation that transforms vectorization into a code generation task. Using a vision-language mo…☆3,868Updated last month
- D-FINE: Redefine Regression Task of DETRs as Fine-grained Distribution Refinement [ICLR 2025 Spotlight]☆2,376Updated last month
- [ECCV2024] API code for T-Rex2: Towards Generic Object Detection via Text-Visual Prompt Synergy☆2,511Updated last month
- [CVPR 2025] Magma: A Foundation Model for Multimodal AI Agents☆1,686Updated last week
- Lets make video diffusion practical!☆14,060Updated last month
- Open-source unified multimodal model☆3,499Updated last week
- Tracking Any Point (TAP)☆1,537Updated 2 weeks ago