facebookresearch/perception_models

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/facebookresearch/perception_models)

facebookresearch / perception_models

State-of-the-art Image & Video CLIP, Multimodal Large Language Models, and More!

☆2,324

Alternatives and similar repositories for perception_models

Users that are interested in perception_models are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

facebookresearch / dinov3
View on GitHub
Reference PyTorch implementation and models for DINOv3
☆10,973Updated this week
NVlabs / RADIO
View on GitHub
Official repository for "AM-RADIO: Reduce All Domains Into One"
☆1,897May 29, 2026Updated last month
facebookresearch / vjepa2
View on GitHub
PyTorch code and models for VJEPA2 self-supervised learning from video.
☆4,372Mar 23, 2026Updated 3 months ago
facebookresearch / sam2
View on GitHub
The repository provides code for running inference with the Meta Segment Anything Model 2 (SAM 2), links for downloading the trained mode…
☆19,566May 30, 2026Updated last month
NVlabs / describe-anything
View on GitHub
[ICCV 2025] Implementation for Describe Anything: Detailed Localized Image and Video Captioning
☆1,505Jun 26, 2025Updated last year
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
facebookresearch / webssl
View on GitHub
Code for "Scaling Language-Free Visual Representation Learning" paper (Web-SSL).
☆214Mar 20, 2026Updated 4 months ago
facebookresearch / MetaCLIP
View on GitHub
NeurIPS 2025 Spotlight; ICLR2024 Spotlight; CVPR 2024; EMNLP 2024
☆1,846Nov 27, 2025Updated 7 months ago
apple / ml-aim
View on GitHub
This repository provides the code and model checkpoints for AIMv1 and AIMv2 research projects.
☆1,425Aug 4, 2025Updated 11 months ago
facebookresearch / dinov2
View on GitHub
PyTorch code and models for the DINOv2 self-supervised learning method.
☆13,124Jun 3, 2026Updated last month
ByteDance-Seed / Bagel
View on GitHub
Open-source unified multimodal model
☆6,103May 4, 2026Updated 2 months ago
bytetriper / RAE
View on GitHub
Official PyTorch Implementation of "Diffusion Transformers with Representation Autoencoders"
☆1,977Feb 25, 2026Updated 4 months ago
QwenLM / Qwen3-VL
View on GitHub
Qwen3-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.
☆19,630Jan 30, 2026Updated 5 months ago
UCSC-VLAA / OpenVision
View on GitHub
OpenVision (ICCV 2025), OpenVision 2 (CVPR 2026), and OpenVision 3
☆487Feb 21, 2026Updated 4 months ago
facebookresearch / sam3
View on GitHub
The repository provides code for running inference and finetuning with the Meta Segment Anything Model 3 (SAM 3), links for downloading t…
☆11,016Updated this week
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
cambrian-mllm / cambrian
View on GitHub
Cambrian-1 is a family of multimodal LLMs with a vision-centric design.
☆2,008Nov 7, 2025Updated 8 months ago
facebookresearch / vggt
View on GitHub
[CVPR 2025 Best Paper Award] VGGT: Visual Geometry Grounded Transformer
☆13,918May 19, 2026Updated 2 months ago
google-research / big_vision
View on GitHub
Official codebase used to develop Vision Transformer, SigLIP, MLP-Mixer, LiT and more.
☆3,494May 19, 2025Updated last year
cambrian-mllm / cambrian-s
View on GitHub
Cambrian-S: Towards Spatial Supersensing in Video
☆560Apr 3, 2026Updated 3 months ago
LLaVA-VL / LLaVA-NeXT
View on GitHub
☆4,709Jun 15, 2026Updated last month
NVlabs / VILA
View on GitHub
VILA is a family of state-of-the-art vision language models (VLMs) for diverse multimodal AI tasks across the edge, data center, and clou…
☆3,842Mar 12, 2026Updated 4 months ago
allenai / molmo
View on GitHub
Code for the Molmo Vision-Language Model
☆918Dec 12, 2024Updated last year
mlfoundations / open_clip
View on GitHub
An open source implementation of CLIP.
☆14,006Updated this week
EvolvingLMMs-Lab / LLaVA-OneVision-2
View on GitHub
Fully Open Framework for Democratized Multimodal Training
☆1,143Updated this week
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
LTH14 / JiT
View on GitHub
PyTorch implementation of JiT https://arxiv.org/abs/2511.13720
☆2,459Dec 8, 2025Updated 7 months ago
ByteDance-Seed / Seed1.5-VL
View on GitHub
Seed1.5-VL, a vision-language foundation model designed to advance general-purpose multimodal understanding and reasoning, achieving stat…
☆1,582Jun 14, 2025Updated last year
ByteDance-Seed / Depth-Anything-3
View on GitHub
Depth Anything 3
☆5,917Updated this week
baaivision / Emu3.5
View on GitHub
Native Multimodal Models are World Learners
☆1,536Dec 30, 2025Updated 6 months ago
CUT3R / CUT3R
View on GitHub
Official implementation of Continuous 3D Perception Model with Persistent State
☆1,464Aug 27, 2025Updated 10 months ago
nv-tlabs / vipe
View on GitHub
ViPE: Video Pose Engine for Geometric 3D Perception
☆2,046Jun 9, 2026Updated last month
showlab / Show-o
View on GitHub
[ICLR & NeurIPS 2025] Repository for Show-o series, One Single Transformer to Unify Multimodal Understanding and Generation.
☆1,963Jan 8, 2026Updated 6 months ago
facebookresearch / pixio
View on GitHub
[CVPR 2026] Pixio: a capable vision encoder dedicated to dense prediction, simply by pixel reconstruction
☆457Updated this week
bytedance / Sa2VA
View on GitHub
Official Repo For Pixel-LLM Codebase: Sa2VA (Arxiv-25), SAMTok (CVPR-26), VRT, SaSaSa2VA (1-st solution for LSVOS)
☆1,650Jun 19, 2026Updated last month
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
microsoft / MoGe
View on GitHub
[CVPR'25 Oral] MoGe: Unlocking Accurate Monocular Geometry Estimation for Open-Domain Images with Optimal Training Supervision
☆2,648Nov 2, 2025Updated 8 months ago
yyfz / Pi3
View on GitHub
[ICLR 2026] π^3: Permutation-Equivariant Visual Geometry Learning
☆2,072Jul 3, 2026Updated 2 weeks ago
huggingface / nanoVLM
View on GitHub
The simplest, fastest repository for training/finetuning small-sized VLMs.
☆4,957Oct 27, 2025Updated 8 months ago
google-deepmind / tips
View on GitHub
TIPSv2 (CVPR'26) and TIPS (ICLR'25)
☆572Jun 1, 2026Updated last month
facebookresearch / flow_matching
View on GitHub
A PyTorch library for implementing flow matching algorithms, featuring continuous and discrete flow matching implementations. It includes…
☆4,623Jan 5, 2026Updated 6 months ago
sihyun-yu / REPA
View on GitHub
[ICLR'25 Oral] Representation Alignment for Generation: Training Diffusion Transformers Is Easier Than You Think
☆1,679Mar 16, 2025Updated last year
andrehuang / loftup
View on GitHub
[ICCV'25 oral] Official Code for "LoftUp: Learning a Coordinate-Based Feature Upsampler for Vision Foundation Models"
☆261Jan 13, 2026Updated 6 months ago