facebookresearch / sapiens
High-resolution models for human tasks.
☆5,003Updated 5 months ago
Alternatives and similar repositories for sapiens
Users that are interested in sapiens are comparing it to the libraries listed below
Sorting:
- Depth Pro: Sharp Monocular Metric Depth in Less Than a Second.☆4,437Updated 3 weeks ago
- The repository provides code for running inference with the Meta Segment Anything Model 2 (SAM 2), links for downloading the trained mode…☆15,436Updated 4 months ago
- CoTracker is a model for tracking any point (pixel) on a video.☆4,300Updated 3 months ago
- Official repository of "SAMURAI: Adapting Segment Anything Model for Zero-Shot Visual Tracking with Motion-Aware Memory"☆6,780Updated last month
- [NeurIPS 2024] Depth Anything V2. A More Capable Foundation Model for Monocular Depth Estimation☆5,447Updated 3 months ago
- [CVPR 2024 - Oral, Best Paper Award Candidate] Marigold: Repurposing Diffusion-Based Image Generators for Monocular Depth Estimation☆2,684Updated last month
- [CVPR 2024] Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data. Foundation Model for Monocular Depth Estimation☆7,508Updated 10 months ago
- Moshi is a speech-text foundation model and full-duplex spoken dialogue framework. It uses Mimi, a state-of-the-art streaming neural audi…☆8,188Updated last week
- 4M: Massively Multimodal Masked Modeling☆1,721Updated 2 months ago
- Grounded SAM 2: Ground and Track Anything in Videos with Grounding DINO, Florence-2 and SAM 2☆2,111Updated last week
- New repo collection for NVIDIA Cosmos: https://github.com/nvidia-cosmos☆7,952Updated 2 weeks ago
- SANA: Efficient High-Resolution Image Synthesis with Linear Diffusion Transformer☆4,126Updated last week
- DUSt3R: Geometric 3D Vision Made Easy☆6,259Updated last month
- [ECCV 2024 Oral] LGM: Large Multi-View Gaussian Model for High-Resolution 3D Content Creation.☆1,882Updated 8 months ago
- [CVPR 2024] Real-Time Open-Vocabulary Object Detection☆5,405Updated 2 months ago
- [NeurIPS 2024] Unique3D: High-Quality and Efficient 3D Mesh Generation from a Single Image☆3,396Updated 4 months ago
- Text-to-Music Generation with Rectified Flow Transformers☆1,696Updated 5 months ago
- Official implementation of "MIMO: Controllable Character Video Synthesis with Spatial Decomposed Modeling"☆1,489Updated 4 months ago
- SF3D: Stable Fast 3D Mesh Reconstruction with UV-unwrapping and Illumination Disentanglement☆1,445Updated 3 months ago
- Efficient Triton Kernels for LLM Training☆5,012Updated this week
- VILA is a family of state-of-the-art vision language models (VLMs) for diverse multimodal AI tasks across the edge, data center, and clou…☆3,232Updated last week
- [ICLR 2024 Oral] Generative Gaussian Splatting for Efficient 3D Content Creation☆4,141Updated last year
- PixArt-α: Fast Training of Diffusion Transformer for Photorealistic Text-to-Image Synthesis☆3,081Updated 6 months ago
- DIAMOND (DIffusion As a Model Of eNvironment Dreams) is a reinforcement learning agent trained in a diffusion world model. NeurIPS 2024 S…☆1,817Updated 5 months ago
- Official Implementation of CVPR24 highlight paper: Matching Anything by Segmenting Anything☆1,273Updated 2 weeks ago
- PyTorch code and models for V-JEPA self-supervised learning from video.☆2,985Updated 2 months ago
- Efficient vision foundation models for high-resolution generation and perception.☆2,853Updated 3 weeks ago
- Official repo for paper "Structured 3D Latents for Scalable and Versatile 3D Generation" (CVPR'25 Spotlight).☆9,451Updated this week
- Official repository for our work on micro-budget training of large-scale diffusion models.☆1,404Updated 4 months ago
- Metric depth estimation from a single image☆2,595Updated last week