facebookresearch / sapiens
High-resolution models for human tasks.
☆4,967Updated 5 months ago
Alternatives and similar repositories for sapiens:
Users that are interested in sapiens are comparing it to the libraries listed below
- Official repository of "SAMURAI: Adapting Segment Anything Model for Zero-Shot Visual Tracking with Motion-Aware Memory"☆6,697Updated last month
- CoTracker is a model for tracking any point (pixel) on a video.☆4,261Updated 3 months ago
- The repository provides code for running inference with the Meta Segment Anything Model 2 (SAM 2), links for downloading the trained mode…☆15,053Updated 3 months ago
- Depth Pro: Sharp Monocular Metric Depth in Less Than a Second.☆4,353Updated 6 months ago
- [NeurIPS 2024] Depth Anything V2. A More Capable Foundation Model for Monocular Depth Estimation☆5,286Updated 3 months ago
- [CVPR 2024 - Oral, Best Paper Award Candidate] Marigold: Repurposing Diffusion-Based Image Generators for Monocular Depth Estimation☆2,658Updated last month
- VILA is a family of state-of-the-art vision language models (VLMs) for diverse multimodal AI tasks across the edge, data center, and clou…☆3,156Updated 3 weeks ago
- [CVPR 2024] Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data. Foundation Model for Monocular Depth Estimation☆7,472Updated 9 months ago
- 4M: Massively Multimodal Masked Modeling☆1,714Updated last month
- New repo collection for NVIDIA Cosmos: https://github.com/nvidia-cosmos☆7,920Updated 3 weeks ago
- [SIGGRAPH Asia 2023 (Technical Communications)] EasyVolcap: Accelerating Neural Volumetric Video Research☆1,102Updated 3 months ago
- SANA: Efficient High-Resolution Image Synthesis with Linear Diffusion Transformer☆4,024Updated this week
- A suite of image and video neural tokenizers☆1,614Updated 2 months ago
- Official Implementation of CVPR24 highlight paper: Matching Anything by Segmenting Anything☆1,263Updated 5 months ago
- [ICLR 2024] Official implementation of DreamCraft3D: Hierarchical 3D Generation with Bootstrapped Diffusion Prior☆2,901Updated 8 months ago
- Metric depth estimation from a single image☆2,569Updated 11 months ago
- This repository contains the official implementation of the research paper, "MobileCLIP: Fast Image-Text Models through Multi-Modal Reinf…☆897Updated 5 months ago
- Official implementation of "MIMO: Controllable Character Video Synthesis with Spatial Decomposed Modeling"☆1,483Updated 3 months ago
- Grounded SAM 2: Ground and Track Anything in Videos with Grounding DINO, Florence-2 and SAM 2☆2,013Updated this week
- The best OSS video generation models☆3,102Updated 3 months ago
- [CVPR 2024] Real-Time Open-Vocabulary Object Detection☆5,331Updated last month
- Tracking Any Point (TAP)☆1,474Updated last week
- [ICLR 2025] Pyramidal Flow Matching for Efficient Video Generative Modeling☆2,900Updated 4 months ago
- EfficientSAM: Leveraged Masked Image Pretraining for Efficient Segment Anything☆2,320Updated 4 months ago
- 4DHumans: Reconstructing and Tracking Humans with Transformers☆1,353Updated 11 months ago
- Accepted as [NeurIPS 2024] Spotlight Presentation Paper☆6,271Updated 6 months ago
- Gaze-LLE: Gaze Target Estimation via Large-Scale Learned Encoders (CVPR 2025, Highlight)☆561Updated this week
- DIAMOND (DIffusion As a Model Of eNvironment Dreams) is a reinforcement learning agent trained in a diffusion world model. NeurIPS 2024 S…☆1,788Updated 4 months ago
- This repository provides the code and model checkpoints for AIMv1 and AIMv2 research projects.☆1,267Updated 5 months ago
- ☆3,712Updated 2 months ago