facebookresearch / sapiens
High-resolution models for human tasks.
☆4,780Updated 2 months ago
Alternatives and similar repositories for sapiens:
Users that are interested in sapiens are comparing it to the libraries listed below
- Depth Pro: Sharp Monocular Metric Depth in Less Than a Second.☆4,043Updated 3 months ago
- Official repository of "SAMURAI: Adapting Segment Anything Model for Zero-Shot Visual Tracking with Motion-Aware Memory"☆6,408Updated this week
- CoTracker is a model for tracking any point (pixel) on a video.☆4,094Updated last week
- [CVPR 2024] Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data. Foundation Model for Monocular Depth Estimation☆7,230Updated 6 months ago
- VILA is a family of state-of-the-art vision language models (VLMs) for diverse multimodal AI tasks across the edge, data center, and clou…☆2,842Updated this week
- 4M: Massively Multimodal Masked Modeling☆1,671Updated 3 months ago
- The repository provides code for running inference with the Meta Segment Anything Model 2 (SAM 2), links for downloading the trained mode…☆13,798Updated last month
- [NeurIPS 2024] Depth Anything V2. A More Capable Foundation Model for Monocular Depth Estimation☆4,455Updated last week
- The best OSS video generation models☆2,785Updated 3 weeks ago
- DUSt3R: Geometric 3D Vision Made Easy☆5,724Updated 4 months ago
- 🤗 LeRobot: Making AI for Robotics more accessible with end-to-end learning☆8,378Updated this week
- Official repository for LTX-Video☆2,665Updated 3 weeks ago
- Accepted as [NeurIPS 2024] Spotlight Presentation Paper☆6,143Updated 4 months ago
- [CVPR 2024 - Oral, Best Paper Award Candidate] Marigold: Repurposing Diffusion-Based Image Generators for Monocular Depth Estimation☆2,529Updated last month
- Qwen2-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.☆4,612Updated this week
- ☆3,320Updated 3 months ago
- Cambrian-1 is a family of multimodal LLMs with a vision-centric design.☆1,831Updated 2 months ago
- Implementation of paper - YOLOv9: Learning What You Want to Learn Using Programmable Gradient Information☆9,119Updated 5 months ago
- Gaze-LLE: Gaze Target Estimation via Large-Scale Learned Encoders☆475Updated 2 weeks ago
- HunyuanVideo: A Systematic Framework For Large Video Generation Model☆7,914Updated this week
- Official repo for "Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models"☆3,236Updated 8 months ago
- [CVPR 2024] Real-Time Open-Vocabulary Object Detection☆4,989Updated 2 months ago
- Grounded SAM 2: Ground and Track Anything in Videos with Grounding DINO, Florence-2 and SAM 2☆1,568Updated last month
- More relighting!☆7,427Updated 2 months ago
- LLaVA-CoT, a visual language model capable of spontaneous, systematic reasoning☆1,770Updated last week
- Code of Pyramidal Flow Matching for Efficient Video Generative Modeling☆2,728Updated last month
- DIAMOND (DIffusion As a Model Of eNvironment Dreams) is a reinforcement learning agent trained in a diffusion world model. NeurIPS 2024 S…☆1,710Updated last month
- Metric depth estimation from a single image☆2,451Updated 8 months ago
- [SIGGRAPH Asia 2024, Journal Track] ToonCrafter: Generative Cartoon Interpolation☆5,528Updated 4 months ago