facebookresearch / sapiensLinks
High-resolution models for human tasks.
☆5,265Updated last year
Alternatives and similar repositories for sapiens
Users that are interested in sapiens are comparing it to the libraries listed below
Sorting:
- Depth Pro: Sharp Monocular Metric Depth in Less Than a Second.☆5,188Updated 8 months ago
- CoTracker is a model for tracking any point (pixel) on a video.☆4,783Updated 11 months ago
- [CVPR 2024] Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data. Foundation Model for Monocular Depth Estimation☆7,955Updated last year
- Official repository of "SAMURAI: Adapting Segment Anything Model for Zero-Shot Visual Tracking with Motion-Aware Memory"☆7,032Updated 10 months ago
- [NeurIPS 2024] Depth Anything V2. A More Capable Foundation Model for Monocular Depth Estimation☆7,451Updated 11 months ago
- 4M: Massively Multimodal Masked Modeling☆1,783Updated 7 months ago
- VILA is a family of state-of-the-art vision language models (VLMs) for diverse multimodal AI tasks across the edge, data center, and clou…☆3,723Updated last month
- Grounded SAM 2: Ground and Track Anything in Videos with Grounding DINO, Florence-2 and SAM 2☆3,217Updated 2 months ago
- The repository provides code for running inference with the Meta Segment Anything Model 2 (SAM 2), links for downloading the trained mode…☆18,315Updated last year
- Repository for Meta Chameleon, a mixed-modal early-fusion foundation model from FAIR.☆2,081Updated last year
- [CVPR 2024 - Oral, Best Paper Award Candidate] Marigold: Repurposing Diffusion-Based Image Generators for Monocular Depth Estimation☆3,055Updated last month
- [ECCV 2024] Official implementation of the paper "Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection"☆9,585Updated last year
- EfficientSAM: Leveraged Masked Image Pretraining for Efficient Segment Anything☆2,464Updated last year
- SANA: Efficient High-Resolution Image Synthesis with Linear Diffusion Transformer☆4,894Updated this week
- Efficient vision foundation models for high-resolution generation and perception.☆3,202Updated 4 months ago
- Reference PyTorch implementation and models for DINOv3☆9,327Updated 2 months ago
- [ICCV 2023] Tracking Anything with Decoupled Video Segmentation☆1,474Updated 8 months ago
- [CVPR 2024] Real-Time Open-Vocabulary Object Detection☆6,154Updated 10 months ago
- [ICLR 2025] Pyramidal Flow Matching for Efficient Video Generative Modeling☆3,148Updated last year
- This repository contains the official implementation of the research papers, "MobileCLIP" CVPR 2024 and "MobileCLIP2" TMLR August 2025☆1,388Updated 3 months ago
- Cambrian-1 is a family of multimodal LLMs with a vision-centric design.☆1,978Updated 2 months ago
- Code repository for Zero123++: a Single Image to Consistent Multi-view Diffusion Base Model.☆1,996Updated last year
- Lumina-T2X is a unified framework for Text to Any Modality Generation☆2,249Updated 11 months ago
- PyTorch code and models for V-JEPA self-supervised learning from video.☆3,447Updated 10 months ago
- Official Implementation of CVPR24 highlight paper: Matching Anything by Segmenting Anything☆1,361Updated 8 months ago
- 4DHumans: Reconstructing and Tracking Humans with Transformers☆1,524Updated last year
- A suite of image and video neural tokenizers☆1,699Updated 11 months ago
- The best OSS video generation models, created by Genmo☆3,577Updated 2 months ago
- [ICLR 2024] Official implementation of DreamCraft3D: Hierarchical 3D Generation with Bootstrapped Diffusion Prior☆2,998Updated 8 months ago
- [NeurIPS 2025] SpatialLM: Training Large Language Models for Structured Indoor Modeling☆4,192Updated 3 months ago