facebookresearch / dinov3Links
Reference PyTorch implementation and models for DINOv3
☆2,877Updated this week
Alternatives and similar repositories for dinov3
Users that are interested in dinov3 are comparing it to the libraries listed below
Sorting:
- State-of-the-art Image & Video CLIP, Multimodal Large Language Models, and More!☆1,500Updated last week
- Official repository for "AM-RADIO: Reduce All Domains Into One"☆1,302Updated 2 weeks ago
- Grounded SAM 2: Ground and Track Anything in Videos with Grounding DINO, Florence-2 and SAM 2☆2,622Updated 2 months ago
- PyTorch code and models for VJEPA2 self-supervised learning from video.☆2,039Updated this week
- DINO-X: The World's Top-Performing Vision Model for Open-World Object Detection and Understanding☆1,178Updated 3 weeks ago
- YOLOE: Real-Time Seeing Anything [ICCV 2025]☆1,607Updated last month
- [ICCV 2025] Implementation for Describe Anything: Detailed Localized Image and Video Captioning☆1,307Updated last month
- Official code for "FeatUp: A Model-Agnostic Frameworkfor Features at Any Resolution" ICLR 2024☆1,563Updated last year
- Efficient vision foundation models for high-resolution generation and perception.☆3,045Updated 3 months ago
- Efficient Track Anything☆620Updated 7 months ago
- Grounding DINO 1.5: IDEA Research's Most Capable Open-World Object Detection Model Series☆1,010Updated 6 months ago
- [CVPR 2025] Official PyTorch Implementation of MambaVision: A Hybrid Mamba-Transformer Vision Backbone☆1,654Updated 3 weeks ago
- A suite of image and video neural tokenizers☆1,666Updated 6 months ago
- Official Implementation of CVPR24 highlight paper: Matching Anything by Segmenting Anything☆1,331Updated 3 months ago
- LightlyTrain is the first PyTorch framework to pretrain computer vision models on unlabeled data for industrial applications☆778Updated this week
- VILA is a family of state-of-the-art vision language models (VLMs) for diverse multimodal AI tasks across the edge, data center, and clou…☆3,482Updated 2 weeks ago
- SpatialLM: Training Large Language Models for Structured Indoor Modeling☆3,750Updated 3 weeks ago
- SAM with text prompt☆2,338Updated last month
- This repository provides the code and model checkpoints for AIMv1 and AIMv2 research projects.☆1,354Updated 2 weeks ago
- Hiera: A fast, powerful, and simple hierarchical vision transformer.☆1,012Updated last year
- Code release for DynamicTanh (DyT)☆1,002Updated 4 months ago
- [CVPR 2025 Highlight] Video Depth Anything: Consistent Depth Estimation for Super-Long Videos☆1,309Updated this week
- PyTorch implementation of FractalGen https://arxiv.org/abs/2502.17437☆1,154Updated 5 months ago
- Official codebase used to develop Vision Transformer, SigLIP, MLP-Mixer, LiT and more.☆3,078Updated 3 months ago
- 4M: Massively Multimodal Masked Modeling☆1,760Updated 2 months ago
- [CVPR 2025 Best Paper Nomination] FoundationStereo: Zero-Shot Stereo Matching☆1,996Updated 2 weeks ago
- About This repository is a curated collection of the most exciting and influential CVPR 2025 papers. 🔥 [Paper + Code + Demo]☆753Updated 2 months ago
- [CVPR 2025] Magma: A Foundation Model for Multimodal AI Agents☆1,778Updated 2 months ago
- The repo for "Distill Any Depth: Distillation Creates a Stronger Monocular Depth Estimator"☆633Updated 3 months ago
- [ICLR 2024] Official PyTorch implementation of FasterViT: Fast Vision Transformers with Hierarchical Attention☆868Updated 3 weeks ago