facebookresearch / sapiensLinks
High-resolution models for human tasks.
☆5,053Updated 7 months ago
Alternatives and similar repositories for sapiens
Users that are interested in sapiens are comparing it to the libraries listed below
Sorting:
- Depth Pro: Sharp Monocular Metric Depth in Less Than a Second.☆4,576Updated 2 months ago
- The repository provides code for running inference with the Meta Segment Anything Model 2 (SAM 2), links for downloading the trained mode…☆15,908Updated 6 months ago
- [NeurIPS 2024] Depth Anything V2. A More Capable Foundation Model for Monocular Depth Estimation☆5,840Updated 5 months ago
- A general fine-tuning kit geared toward diffusion models.☆2,386Updated last week
- CoTracker is a model for tracking any point (pixel) on a video.☆4,391Updated 5 months ago
- Run PyTorch LLMs locally on servers, desktop and mobile☆3,596Updated last month
- [CVPR 2024] Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data. Foundation Model for Monocular Depth Estimation☆7,608Updated 11 months ago
- VILA is a family of state-of-the-art vision language models (VLMs) for diverse multimodal AI tasks across the edge, data center, and clou…☆3,344Updated last week
- [CVPR 2024 - Oral, Best Paper Award Candidate] Marigold: Repurposing Diffusion-Based Image Generators for Monocular Depth Estimation☆2,829Updated last month
- SANA: Efficient High-Resolution Image Synthesis with Linear Diffusion Transformer☆4,300Updated 3 weeks ago
- Official repository of "SAMURAI: Adapting Segment Anything Model for Zero-Shot Visual Tracking with Motion-Aware Memory"☆6,847Updated 3 months ago
- 4M: Massively Multimodal Masked Modeling☆1,740Updated 3 weeks ago
- Stable Virtual Camera: Generative View Synthesis with Diffusion Models☆1,327Updated 3 weeks ago
- [SIGGRAPH Asia 2023 (Technical Communications)] EasyVolcap: Accelerating Neural Volumetric Video Research☆1,372Updated 5 months ago
- ☆2,119Updated 7 months ago
- PyTorch code and models for V-JEPA self-supervised learning from video.☆3,102Updated 4 months ago
- [ECCV 2024] Official implementation of the paper "X-Pose: Detecting Any Keypoints"☆695Updated 10 months ago
- On-device AI across mobile, embedded and edge for PyTorch☆2,980Updated this week
- Grounded SAM 2: Ground and Track Anything in Videos with Grounding DINO, Florence-2 and SAM 2☆2,347Updated last month
- Official implementation of "MIMO: Controllable Character Video Synthesis with Spatial Decomposed Modeling"☆1,513Updated last week
- Moshi is a speech-text foundation model and full-duplex spoken dialogue framework. It uses Mimi, a state-of-the-art streaming neural audi…☆8,484Updated this week
- The best OSS video generation models☆3,231Updated 5 months ago
- Efficient Triton Kernels for LLM Training☆5,246Updated last week
- [ICLR 2025] CatVTON is a simple and efficient virtual try-on diffusion model with 1) Lightweight Network (899.06M parameters totally), 2)…☆1,423Updated 4 months ago
- This repository provides the code and model checkpoints for AIMv1 and AIMv2 research projects.☆1,305Updated 2 months ago
- [ICLR 2025] Pyramidal Flow Matching for Efficient Video Generative Modeling☆2,979Updated 6 months ago
- Gaze-LLE: Gaze Target Estimation via Large-Scale Learned Encoders (CVPR 2025, Highlight)☆719Updated 2 months ago
- [ICLR 2024 Oral] Generative Gaussian Splatting for Efficient 3D Content Creation☆4,164Updated last year
- 4DHumans: Reconstructing and Tracking Humans with Transformers☆1,388Updated last year
- Code repository for Zero123++: a Single Image to Consistent Multi-view Diffusion Base Model.☆1,918Updated last year