yangchris11 / samurai
Official repository of "SAMURAI: Adapting Segment Anything Model for Zero-Shot Visual Tracking with Motion-Aware Memory"
☆6,655Updated last week
Alternatives and similar repositories for samurai:
Users that are interested in samurai are comparing it to the libraries listed below
- The repository provides code for running inference with the Meta Segment Anything Model 2 (SAM 2), links for downloading the trained mode…☆14,761Updated 3 months ago
- [NeurIPS 2024] Depth Anything V2. A More Capable Foundation Model for Monocular Depth Estimation☆5,020Updated 2 months ago
- High-resolution models for human tasks.☆4,903Updated 4 months ago
- CoTracker is a model for tracking any point (pixel) on a video.☆4,204Updated 2 months ago
- Depth Pro: Sharp Monocular Metric Depth in Less Than a Second.☆4,284Updated 5 months ago
- Official repository for LTX-Video☆3,221Updated 3 weeks ago
- Grounded SAM 2: Ground and Track Anything in Videos with Grounding DINO, Florence-2 and SAM 2☆1,903Updated this week
- [CVPR 2024] Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data. Foundation Model for Monocular Depth Estimation☆7,410Updated 8 months ago
- Official repo for paper "Structured 3D Latents for Scalable and Versatile 3D Generation" (CVPR'25).☆8,692Updated this week
- SANA: Efficient High-Resolution Image Synthesis with Linear Diffusion Transformer☆3,796Updated this week
- [CVPR 2024] Real-Time Open-Vocabulary Object Detection☆5,241Updated last month
- [NeurIPS 2024] Unique3D: High-Quality and Efficient 3D Mesh Generation from a Single Image☆3,328Updated 3 months ago
- [ICLR 2025] Pyramidal Flow Matching for Efficient Video Generative Modeling☆2,862Updated 3 months ago
- YOLOE: Real-Time Seeing Anything☆945Updated this week
- [CVPR 2024 - Oral, Best Paper Award Candidate] Marigold: Repurposing Diffusion-Based Image Generators for Monocular Depth Estimation☆2,619Updated this week
- Official Implementation of CVPR24 highligt paper: Matching Anything by Segmenting Anything☆1,242Updated 4 months ago
- A general fine-tuning kit geared toward diffusion models.☆2,161Updated this week
- YOLOv10: Real-Time End-to-End Object Detection [NeurIPS 2024]☆10,541Updated 2 weeks ago
- SpatialLM: Large Language Model for Spatial Understanding☆2,563Updated this week
- RF-DETR is a real-time object detection model architecture developed by Roboflow, SOTA on COCO & designed for fine-tuning.☆1,422Updated this week
- PyTorch code and models for the DINOv2 self-supervised learning method.☆10,114Updated 7 months ago
- streamline the fine-tuning process for multimodal models: PaliGemma 2, Florence-2, and Qwen2.5-VL☆2,521Updated this week
- [CVPR 2025] VGGT: Visual Geometry Grounded Transformer☆3,564Updated this week
- [ECCV 2024] Official implementation of the paper "Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection"☆7,730Updated 7 months ago
- DINO-X: The World's Top-Performing Vision Model for Open-World Object Detection and Understanding☆953Updated this week
- Cosmos is a world model development platform that consists of world foundation models, tokenizers and video processing pipeline to accele…☆7,810Updated last week
- Official implementation of "MIMO: Controllable Character Video Synthesis with Spatial Decomposed Modeling"☆1,471Updated 2 months ago
- Official repository for our work on micro-budget training of large-scale diffusion models.☆1,364Updated 2 months ago
- Gaze-LLE: Gaze Target Estimation via Large-Scale Learned Encoders (CVPR 2025)☆533Updated this week
- Turn any computer or edge device into a command center for your computer vision projects.☆1,598Updated this week