facebookresearch / sapiens
High-resolution models for human tasks.
☆4,809Updated 2 months ago
Alternatives and similar repositories for sapiens:
Users that are interested in sapiens are comparing it to the libraries listed below
- CoTracker is a model for tracking any point (pixel) on a video.☆4,125Updated 3 weeks ago
- Official repository of "SAMURAI: Adapting Segment Anything Model for Zero-Shot Visual Tracking with Motion-Aware Memory"☆6,484Updated 2 weeks ago
- Depth Pro: Sharp Monocular Metric Depth in Less Than a Second.☆4,098Updated 4 months ago
- 4M: Massively Multimodal Masked Modeling☆1,682Updated 4 months ago
- The repository provides code for running inference with the Meta Segment Anything Model 2 (SAM 2), links for downloading the trained mode…☆14,032Updated last month
- VILA is a family of state-of-the-art vision language models (VLMs) for diverse multimodal AI tasks across the edge, data center, and clou…☆2,903Updated this week
- [NeurIPS 2024] Depth Anything V2. A More Capable Foundation Model for Monocular Depth Estimation☆4,582Updated 3 weeks ago
- A general fine-tuning kit geared toward diffusion models.☆2,077Updated this week
- SANA: Efficient High-Resolution Image Synthesis with Linear Diffusion Transformer☆3,328Updated this week
- Grounded SAM 2: Ground and Track Anything in Videos with Grounding DINO, Florence-2 and SAM 2☆1,650Updated last month
- Efficient Triton Kernels for LLM Training☆4,415Updated this week
- [SIGGRAPH Asia 2023 (Technical Communications)] EasyVolcap: Accelerating Neural Volumetric Video Research☆1,050Updated 3 weeks ago
- Gaze-LLE: Gaze Target Estimation via Large-Scale Learned Encoders☆487Updated last month
- Cosmos is a world model development platform that consists of world foundation models, tokenizers and video processing pipeline to accele…☆7,469Updated this week
- ☆3,392Updated this week
- streamline the fine-tuning process for multimodal models: PaliGemma 2, Florence-2, and Qwen2.5-VL☆2,331Updated this week
- A suite of image and video neural tokenizers☆1,553Updated this week
- tiny vision language model☆7,367Updated last week
- [CVPR 2024] Real-Time Open-Vocabulary Object Detection☆5,040Updated 3 months ago
- Lumina-T2X is a unified framework for Text to Any Modality Generation☆2,151Updated 6 months ago
- Code of Pyramidal Flow Matching for Efficient Video Generative Modeling☆2,768Updated last month
- Official Implementation of CVPR24 highligt paper: Matching Anything by Segmenting Anything☆1,194Updated 3 months ago
- SF3D: Stable Fast 3D Mesh Reconstruction with UV-unwrapping and Illumination Disentanglement☆1,356Updated 3 weeks ago
- Accepted as [NeurIPS 2024] Spotlight Presentation Paper☆6,174Updated 4 months ago
- Repository for Meta Chameleon, a mixed-modal early-fusion foundation model from FAIR.☆1,923Updated 6 months ago
- Official repository for LTX-Video☆2,817Updated last month
- The best OSS video generation models☆2,872Updated last month
- PyTorch code and models for V-JEPA self-supervised learning from video.☆2,773Updated 6 months ago
- DeepSeek-VL: Towards Real-World Vision-Language Understanding☆3,425Updated 9 months ago
- [CVPR 2024] Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data. Foundation Model for Monocular Depth Estimation☆7,284Updated 6 months ago