facebookresearch / sapiensLinks

High-resolution models for human tasks.

☆5,178

Alternatives and similar repositories for sapiens

Users that are interested in sapiens are comparing it to the libraries listed below

Sorting:

facebookresearch / co-tracker
CoTracker is a model for tracking any point (pixel) on a video.
☆4,623Updated 9 months ago
apple / ml-depth-pro
Depth Pro: Sharp Monocular Metric Depth in Less Than a Second.
☆4,943Updated 6 months ago
LiheYoung / Depth-Anything
[CVPR 2024] Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data. Foundation Model for Monocular Depth Estimation
☆7,800Updated last year
NVlabs / VILA
VILA is a family of state-of-the-art vision language models (VLMs) for diverse multimodal AI tasks across the edge, data center, and clou…
☆3,614Updated last week
IDEA-Research / Grounded-SAM-2
Grounded SAM 2: Ground and Track Anything in Videos with Grounding DINO, Florence-2 and SAM 2
☆2,917Updated 2 weeks ago
NVIDIA / Cosmos-Tokenizer
A suite of image and video neural tokenizers
☆1,675Updated 8 months ago
DepthAnything / Depth-Anything-V2
[NeurIPS 2024] Depth Anything V2. A More Capable Foundation Model for Monocular Depth Estimation
☆6,825Updated 9 months ago
apple / ml-4m
4M: Massively Multimodal Masked Modeling
☆1,766Updated 4 months ago
google-deepmind / tapnet
Tracking Any Point (TAP)
☆1,691Updated last week
prs-eth / Marigold
[CVPR 2024 - Oral, Best Paper Award Candidate] Marigold: Repurposing Diffusion-Based Image Generators for Monocular Depth Estimation
☆2,968Updated 5 months ago
facebookresearch / sam2
The repository provides code for running inference with the Meta Segment Anything Model 2 (SAM 2), links for downloading the trained mode…
☆17,432Updated 10 months ago
yangchris11 / samurai
Official repository of "SAMURAI: Adapting Segment Anything Model for Zero-Shot Visual Tracking with Motion-Aware Memory"
☆6,978Updated 7 months ago
facebookresearch / perception_models
State-of-the-art Image & Video CLIP, Multimodal Large Language Models, and More!
☆1,698Updated last month
jy0205 / Pyramid-Flow
[ICLR 2025] Pyramidal Flow Matching for Efficient Video Generative Modeling
☆3,102Updated 10 months ago
zju3dv / EasyVolcap
[SIGGRAPH Asia 2023 (Technical Communications)] EasyVolcap: Accelerating Neural Volumetric Video Research
☆1,486Updated 9 months ago
feizc / FluxMusic
Text-to-Music Generation with Rectified Flow Transformers
☆1,707Updated 10 months ago
manycore-research / SpatialLM
[NeurIPS 2025] SpatialLM: Training Large Language Models for Structured Indoor Modeling
☆4,055Updated last month
SonyResearch / micro_diffusion
Official repository for our work on micro-budget training of large-scale diffusion models.
☆1,521Updated 9 months ago
Tencent / DepthCrafter
[CVPR 2025 Highlight] DepthCrafter: Generating Consistent Long Depth Sequences for Open-world Videos
☆1,458Updated 3 months ago
facebookresearch / vggt
[CVPR 2025 Best Paper Award] VGGT: Visual Geometry Grounded Transformer
☆11,418Updated 2 weeks ago
microsoft / MoGe
[CVPR'25 Oral] MoGe: Unlocking Accurate Monocular Geometry Estimation for Open-Domain Images with Optimal Training Supervision
☆1,856Updated last week
NVlabs / Sana
SANA: Efficient High-Resolution Image Synthesis with Linear Diffusion Transformer
☆4,613Updated last week
AiuniAI / Unique3D
[NeurIPS 2024] Unique3D: High-Quality and Efficient 3D Mesh Generation from a Single Image
☆3,485Updated 3 months ago
facebookresearch / dinov3
Reference PyTorch implementation and models for DINOv3
☆7,965Updated this week
siyuanliii / masa
Official Implementation of CVPR24 highlight paper: Matching Anything by Segmenting Anything
☆1,346Updated 5 months ago
shubham-goel / 4D-Humans
4DHumans: Reconstructing and Tracking Humans with Transformers
☆1,448Updated last year
DepthAnything / Video-Depth-Anything
[CVPR 2025 Highlight] Video Depth Anything: Consistent Depth Estimation for Super-Long Videos
☆1,506Updated 3 weeks ago
mit-han-lab / efficientvit
Efficient vision foundation models for high-resolution generation and perception.
☆3,108Updated last month
facebookresearch / jepa
PyTorch code and models for V-JEPA self-supervised learning from video.
☆3,245Updated 8 months ago
facebookresearch / chameleon
Repository for Meta Chameleon, a mixed-modal early-fusion foundation model from FAIR.
☆2,062Updated last year