facebookresearch / sapiens
High-resolution models for human tasks.
☆3,890Updated this week
Related projects: ⓘ
- The repository provides code for running inference with the Meta Segment Anything Model 2 (SAM 2), links for downloading the trained mode…☆10,811Updated 3 weeks ago
- Depth Anything V2. A More Capable Foundation Model for Monocular Depth Estimation☆3,314Updated last month
- Run PyTorch LLMs locally on servers, desktop and mobile☆3,116Updated this week
- tiny vision language model☆4,893Updated 3 weeks ago
- Your image is almost there!☆7,215Updated last month
- [CVPR 2024] Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data. Foundation Model for Monocular Depth Estimation☆6,769Updated 2 months ago
- Create Magic Story!☆5,787Updated last month
- Repository for Meta Chameleon, a mixed-modal early-fusion foundation model from FAIR.☆1,758Updated last month
- PyTorch code and models for V-JEPA self-supervised learning from video.☆2,614Updated last month
- Efficient Triton Kernels for LLM Training☆2,947Updated this week
- Text-to-Music Generation with Rectified Flow Transformers☆1,421Updated last week
- 🤗 LeRobot: Making AI for Robotics more accessible with end-to-end learning☆6,362Updated this week
- 4M: Massively Multimodal Masked Modeling☆1,543Updated 2 months ago
- Official inference repo for FLUX.1 models☆13,678Updated this week
- [CVPR 2024] Real-Time Open-Vocabulary Object Detection☆4,360Updated last month
- V-Express aims to generate a talking head video under the control of a reference image, an audio, and a sequence of V-Kps images.☆2,183Updated 2 months ago
- Enjoy the magic of Diffusion models!☆6,349Updated this week
- Official Code for Stable Cascade☆6,511Updated last month
- Official repo for "Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models"☆3,178Updated 4 months ago
- [SIGGRAPH Asia 2024, Journal Track] ToonCrafter: Generative Cartoon Interpolation☆5,127Updated last week
- YOLOv10: Real-Time End-to-End Object Detection☆9,343Updated last month
- VILA - a multi-image visual language model with training, inference and evaluation recipe, deployable from cloud to edge (Jetson Orin and…☆1,786Updated last week
- DUSt3R: Geometric 3D Vision Made Easy☆5,019Updated last month
- ☆4,269Updated last month
- Code and dataset for photorealistic Codec Avatars driven from audio☆2,669Updated this week
- Lightning-fast serving engine for AI models. Flexible. Easy. Enterprise-scale.☆2,096Updated this week
- Zero-Shot Speech Editing and Text-to-Speech in the Wild☆7,459Updated 2 months ago
- Mora: More like Sora for Generalist Video Generation☆1,475Updated 2 months ago
- Official code implementation of General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model☆3,389Updated this week
- Inference and training library for high-quality TTS models.☆4,220Updated last month