yangchris11 / samurai
Official repository of "SAMURAI: Adapting Segment Anything Model for Zero-Shot Visual Tracking with Motion-Aware Memory"
☆6,710Updated last month
Alternatives and similar repositories for samurai:
Users that are interested in samurai are comparing it to the libraries listed below
- The repository provides code for running inference with the Meta Segment Anything Model 2 (SAM 2), links for downloading the trained mode…☆15,145Updated 4 months ago
- Depth Pro: Sharp Monocular Metric Depth in Less Than a Second.☆4,373Updated this week
- Official repository for LTX-Video☆3,506Updated last week
- CoTracker is a model for tracking any point (pixel) on a video.☆4,261Updated 3 months ago
- YuE: Open Full-song Music Generation Foundation Model, something similar to Suno.ai but open☆4,842Updated 2 weeks ago
- [NeurIPS 2024] Depth Anything V2. A More Capable Foundation Model for Monocular Depth Estimation☆5,286Updated 3 months ago
- SANA: Efficient High-Resolution Image Synthesis with Linear Diffusion Transformer☆4,024Updated this week
- tiny vision language model☆7,817Updated last week
- New repo collection for NVIDIA Cosmos: https://github.com/nvidia-cosmos☆7,920Updated 3 weeks ago
- Fast and accurate automatic speech recognition (ASR) for edge devices☆2,687Updated last month
- [CVPR 2024] Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data. Foundation Model for Monocular Depth Estimation☆7,472Updated 9 months ago
- [CVPR 2024] Real-Time Open-Vocabulary Object Detection☆5,331Updated last month
- [SIGGRAPH Asia 2024, Journal Track] ToonCrafter: Generative Cartoon Interpolation☆5,770Updated last month
- Zonos-v0.1 is a leading open-weight text-to-speech model trained on more than 200k hours of varied multilingual speech, delivering expres…☆6,452Updated last month
- Text-to-Music Generation with Rectified Flow Transformers☆1,688Updated 4 months ago
- Official Implementation of CVPR24 highlight paper: Matching Anything by Segmenting Anything☆1,265Updated 5 months ago
- MAGI-1: Autoregressive Video Generation at Scale☆2,056Updated this week
- The best OSS video generation models☆3,102Updated 3 months ago
- [ICLR 2025] Pyramidal Flow Matching for Efficient Video Generative Modeling☆2,900Updated 4 months ago
- Grounded SAM 2: Ground and Track Anything in Videos with Grounding DINO, Florence-2 and SAM 2☆2,013Updated this week
- DINO-X: The World's Top-Performing Vision Model for Open-World Object Detection and Understanding☆1,005Updated this week
- HunyuanVideo: A Systematic Framework For Large Video Generation Model☆9,770Updated last week
- High-resolution models for human tasks.☆4,967Updated 5 months ago
- YOLOE: Real-Time Seeing Anything☆1,155Updated 3 weeks ago
- A suite of image and video neural tokenizers☆1,619Updated 2 months ago
- Towards Human-Sounding Speech☆4,490Updated last week
- Official inference repo for FLUX.1 models☆21,423Updated 2 months ago
- [CVPR 2025] MatAnyone: Stable Video Matting with Consistent Memory Propagation☆1,045Updated last week
- [ECCV2024] API code for T-Rex2: Towards Generic Object Detection via Text-Visual Prompt Synergy☆2,460Updated this week
- 4M: Massively Multimodal Masked Modeling☆1,714Updated last month