om-ai-lab/VLM-FO1

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/om-ai-lab/VLM-FO1)

om-ai-lab / VLM-FO1

VLM-FO1: Bridging the Gap Between High-Level Reasoning and Fine-Grained Perception in VLMs

☆329

Alternatives and similar repositories for VLM-FO1

Users that are interested in VLM-FO1 are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

IDEA-Research / Rex-Omni
View on GitHub
[CVPR2026] Detect Anything via Next Point Prediction
☆1,509Feb 22, 2026Updated 5 months ago
WeChatCV / WeDetect
View on GitHub
(CVPR 2026) Official repository of paper "WeDetect: Fast Open-Vocabulary Object Detection as Retrieval"
☆241Jun 7, 2026Updated last month
om-ai-lab / OVDEval
View on GitHub
A Comprehensive Evaluation Benchmark for Open-Vocabulary Detection (AAAI 2024)
☆63Apr 10, 2026Updated 3 months ago
Gorilla-Lab-SCUT / PaDT
View on GitHub
[ICLR 2026] Official implementation of "Patch-as-Decodable-Token: Towards Unified Multi-Modal Vision Tasks in MLLMs"
☆161Oct 31, 2025Updated 8 months ago
Intellindust-AI-Lab / FT-FSOD
View on GitHub
[CVPR 2026] A Closer Look at Cross-Domain Few-Shot Object Detection: Fine-Tuning Matters and Parallel Decoder Helps
☆47May 11, 2026Updated 2 months ago
Serverless GPU API endpoints on Runpod - Get Bonus Credits • Ad
Skip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
miquel-espinosa / no-time-to-train
View on GitHub
Official code for "No time to train! Training-Free Reference-Based Instance Segmentation"
☆313Apr 14, 2026Updated 3 months ago
debby-0527 / SAM3-I
View on GitHub
Official code and resources for SAM3-I.
☆175Apr 14, 2026Updated 3 months ago
JiazuoYu / Fines
View on GitHub
Code for paper "FineRS: Fine-grained Reasoning and Segmentation of Small Objects with Reinforcement Learning" Neurips2025.
☆15Jan 29, 2026Updated 5 months ago
Intellindust-AI-Lab / DEIMv2
View on GitHub
[DEIMv2] Real Time Object Detection Meets DINOv3
☆1,935Mar 24, 2026Updated 3 months ago
YuHengsss / SD-RPN
View on GitHub
[ICLR2026] Catching the Details: Self-Distilled RoI Predictors for Fine-Grained MLLM Perception
☆17Jan 26, 2026Updated 5 months ago
PKU-ICST-MIPL / DyFo_CVPR2025
View on GitHub
☆116Aug 14, 2025Updated 11 months ago
om-ai-lab / VLM-R1
View on GitHub
Solve Visual Understanding with Reinforced VLMs
☆6,012Jul 7, 2026Updated 2 weeks ago
yujunhuics / Reyes
View on GitHub
2025.01：从零到一实现了一个多模态大模型，并命名为Reyes（睿视），R：睿，eyes：眼。Reyes的参数量为8B，视觉编码器使用的是InternViT-300M-448px-V2_5,语言模型侧使用的是Qwen2.5-7B-Instruct，Reyes也通过一个两…
☆34Feb 10, 2026Updated 5 months ago
fuweifuvtoo / PET_DINO
View on GitHub
[CVPR 2026 Highlight 🔥] PET-DINO: Unifying Visual Cues into Grounding DINO with Prompt-Enriched Training
☆42May 6, 2026Updated 2 months ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
THU-MIG / yoloe
View on GitHub
YOLOE: Real-Time Seeing Anything [ICCV 2025]
☆2,209Jun 26, 2025Updated last year
rui-qian / UGround
View on GitHub
Rui Qian, Xin Yin, Chuanhang Deng, et al.: UGround: Towards Unified Visual Grounding with Unrolled Transformers (ICML 2026)
☆29Jun 18, 2026Updated last month
Tencent / YOLO-Master
View on GitHub
[CVPR2026]🚀🚀🚀Official code for the paper "YOLO-Master: MOE-Accelerated with Specialized Transformers for Enhanced Real-time Detectio…
☆595Updated this week
IDEA-Research / Rex-Thinker
View on GitHub
[ICLR-2026] Rex-Thinker: Grounded Object Refering via Chain-of-Thought Reasoning
☆150Jun 30, 2025Updated last year
WeChatCV / WeVisionOne
View on GitHub
☆64Nov 11, 2025Updated 8 months ago
YuHengsss / Trident
View on GitHub
[ICCV2025] Harnessing CLIP, DINO and SAM for Open Vocabulary Segmentation
☆125Nov 22, 2025Updated 8 months ago
PolyU-ChenLab / UniPixel
View on GitHub
🔮 UniPixel: Unified Object Referring and Segmentation for Pixel-Level Visual Reasoning (NeurIPS 2025)
☆247Jan 4, 2026Updated 6 months ago
wanghao9610 / X-SAM
View on GitHub
[AAAI2026] X-SAM: From Segment Anything to Any Segmentation
☆383Jul 14, 2026Updated last week
mengcaopku / SpatialDreamer
View on GitHub
SpatialDreamer: Incentivizing Spatial Reasoning via Active Mental Imagery
☆15Feb 1, 2026Updated 5 months ago
Deploy open-source AI quickly and easily - Special Bonus Offer • Ad
Runpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
Bigtuo / NPU-ais_bench
View on GitHub
☆15Oct 20, 2024Updated last year
lightly-ai / lightly-train
View on GitHub
All-in-one training for vision models (YOLO, ViTs, RT-DETR, DINOv3): pretraining, fine-tuning, distillation.
☆1,623Updated this week
mranzinger / sam3-radio
View on GitHub
The repository provides code for running inference and finetuning with the Meta Segment Anything Model 3 (SAM 3), links for downloading t…
☆30Jan 20, 2026Updated 6 months ago
Yanhui-Lee / IAD-R1
View on GitHub
We propose IAD-R1, a universal post-training framework that enhances Vision-Language Models for industrial anomaly detection through a tw…
☆95Dec 9, 2025Updated 7 months ago
GinnyXiao / OpenWorldSAM
View on GitHub
[Neurips 2025 Spotlight] Official repository for the paper: OpenWorldSAM: Extending SAM2 for Universal Image Segmentation with Language P…
☆46Jan 4, 2026Updated 6 months ago
facebookresearch / sam3
View on GitHub
The repository provides code for running inference and finetuning with the Meta Segment Anything Model 3 (SAM 3), links for downloading t…
☆11,021Jul 15, 2026Updated last week
naver-ai / maskris
View on GitHub
Official PyTorch implementation of “MaskRIS: Semantic Distortion-aware Data Augmentation for Referring Image Segmentation”
☆18Dec 5, 2024Updated last year
CVHub520 / X-AnyLabeling
View on GitHub
Effortless data labeling with AI support from Segment Anything and other awesome models.
☆9,824Updated this week
aemior / UMatcher
View on GitHub
UMatcher: A modern template matching model
☆88May 31, 2025Updated last year
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
jefferyZhan / Griffon
View on GitHub
Official repo of Griffon series including v1(ECCV 2024), v2(ICCV 2025), G, and R, and also the RL tool Vision-R1(CVPR 2026).
☆250Apr 17, 2026Updated 3 months ago
RADSeg-OVSS / RADSeg
View on GitHub
[CVPR'26 Findings] Source code for "RADSeg Unleashing Parameter and Compute Efficient Zero-Shot Open-Vocabulary Segmentation Using Agglom…
☆60May 31, 2026Updated last month
hustvl / SuperCLIP
View on GitHub
☆140Dec 26, 2025Updated 6 months ago
iSEE-Laboratory / LLMDet
View on GitHub
(CVPR 2025 highlight✨) Official repository of paper "LLMDet: Learning Strong Open-Vocabulary Object Detectors under the Supervision of La…
☆606Feb 4, 2026Updated 5 months ago
7HHHHH / VisualAD
View on GitHub
VisualAD: Language-Free Zero-Shot Anomaly Detection via Vision Transformer (CVPR 2026)
☆108Jun 7, 2026Updated last month
Aymanbegh / CD-COCO
View on GitHub
☆17Nov 30, 2023Updated 2 years ago
HuiGuanLab / RaTSG
View on GitHub
This is a repository contains the implementation of our NeurIPS'24 paper "Temporal Sentence Grounding with Relevance Feedback in Videos"
☆13Aug 22, 2025Updated 11 months ago