yangchris11 / samurai
Official repository of "SAMURAI: Adapting Segment Anything Model for Zero-Shot Visual Tracking with Motion-Aware Memory"
☆6,599Updated 3 weeks ago
Alternatives and similar repositories for samurai:
Users that are interested in samurai are comparing it to the libraries listed below
- The repository provides code for running inference with the Meta Segment Anything Model 2 (SAM 2), links for downloading the trained mode…☆14,461Updated 2 months ago
- High-resolution models for human tasks.☆4,878Updated 3 months ago
- Official repository for LTX-Video☆3,109Updated last week
- Depth Pro: Sharp Monocular Metric Depth in Less Than a Second.☆4,214Updated 5 months ago
- [NeurIPS 2024] Depth Anything V2. A More Capable Foundation Model for Monocular Depth Estimation☆4,851Updated last month
- CoTracker is a model for tracking any point (pixel) on a video.☆4,167Updated last month
- Grounded SAM 2: Ground and Track Anything in Videos with Grounding DINO, Florence-2 and SAM 2☆1,813Updated 2 months ago
- 4M: Massively Multimodal Masked Modeling☆1,691Updated this week
- [CAAI AIR'24] Bilateral Reference for High-Resolution Dichotomous Image Segmentation☆1,942Updated this week
- [CVPR 2024] Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data. Foundation Model for Monocular Depth Estimation☆7,353Updated 7 months ago
- streamline the fine-tuning process for multimodal models: PaliGemma 2, Florence-2, and Qwen2.5-VL☆2,436Updated this week
- SANA: Efficient High-Resolution Image Synthesis with Linear Diffusion Transformer☆3,527Updated this week
- Official Implementation of CVPR24 highligt paper: Matching Anything by Segmenting Anything☆1,230Updated 4 months ago
- A generative world for general-purpose robotics & embodied AI learning.☆24,304Updated this week
- HunyuanVideo: A Systematic Framework For Large Video Generation Model☆9,163Updated last week
- Official repository for our work on micro-budget training of large-scale diffusion models.☆1,332Updated 2 months ago
- The best OSS video generation models☆2,998Updated 2 months ago
- Official repo for paper "Structured 3D Latents for Scalable and Versatile 3D Generation" (CVPR 25).☆8,255Updated 2 months ago
- Official code for "F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching"☆10,222Updated 2 weeks ago
- DINO-X: The World's Top-Performing Vision Model for Open-World Object Detection and Understanding☆910Updated last month
- DimensionX: Create Any 3D and 4D Scenes from a Single Image with Controllable Video Diffusion☆1,210Updated 3 months ago
- [CVPR'25] Official Implementations for Paper - MagicQuill: An Intelligent Interactive Image Editing System☆3,023Updated 2 weeks ago
- [NeurIPS 2024] Official code for PuLID: Pure and Lightning ID Customization via Contrastive Alignment☆3,153Updated 3 months ago
- Official inference repo for FLUX.1 models☆20,696Updated last month
- Fast and accurate automatic speech recognition (ASR) for edge devices☆2,621Updated 2 weeks ago
- [CVPR 2024] Real-Time Open-Vocabulary Object Detection☆5,145Updated 2 weeks ago
- Gaze-LLE: Gaze Target Estimation via Large-Scale Learned Encoders (CVPR 2025)☆513Updated this week
- tiny vision language model☆7,560Updated 2 weeks ago