om-ai-lab / VLM-FO1Links
VLM-FO1: Bridging the Gap Between High-Level Reasoning and Fine-Grained Perception in VLMs
☆212Updated last month
Alternatives and similar repositories for VLM-FO1
Users that are interested in VLM-FO1 are comparing it to the libraries listed below
Sorting:
- Mamba-YOLO-World: Marrying YOLO-World with Mamba for Open-Vocabulary Detection☆94Updated 9 months ago
- Rex-Thinker: Grounded Object Refering via Chain-of-Thought Reasoning☆131Updated 6 months ago
- YOLO-UniOW: Efficient Universal Open-World Object Detection☆171Updated 11 months ago
- Includes the VideoCount dataset and CountVid code for the paper Open-World Object Counting in Videos.☆83Updated 3 weeks ago
- Official implementation of RT-DETRv4: Painlessly Furthering Real-Time Object Detection with Vision Foundation Models☆194Updated last month
- 🔮 UniPixel: Unified Object Referring and Segmentation for Pixel-Level Visual Reasoning (NeurIPS 2025)☆215Updated 2 months ago
- Official code for "No time to train! Training-Free Reference-Based Instance Segmentation"☆268Updated last month
- ☆53Updated 5 months ago
- [ECCV 2024] SegVG: Transferring Object Bounding Box to Segmentation for Visual Grounding☆64Updated last year
- Train InternViT-6B in MMSegmentation and MMDetection with DeepSpeed☆108Updated last year
- [WACV 2026] Official implementation of the paper: “CountingDINO: A Training-free Pipeline for Exemplar-based Class-Agnostic Counting”☆44Updated last month
- A Light-Weight Framework for Open-Set Object Detection with Decoupled Feature Alignment in Joint Space☆96Updated last month
- [ECCV2024] Official implementation of Crowd-SAM: SAM as a Smart Annotator for Object Detection in Crowded Scenes☆95Updated 7 months ago
- The source code of IEEE TPAMI 2025 "Hyper-YOLO: When Visual Object Detection Meets Hypergraph Computation".☆117Updated last year
- 🏄 [ICLR 2025] OVTR: End-to-End Open-Vocabulary Multiple Object Tracking with Transformer☆84Updated 5 months ago
- [NeurIPS 2024 Spotlight ⭐️ & TPAMI 2025] Parameter-Inverted Image Pyramid Networks (PIIP)☆107Updated 5 months ago
- Detect Anything via Next Point Prediction (Based on Qwen2.5-VL-3B)☆1,039Updated 3 weeks ago
- [ECCV 2024] Official implementation of "LaMI-DETR: Open-Vocabulary Detection with Language Model Instruction"☆88Updated 2 weeks ago
- [NeurIPS2025 Spotlight 🔥 ] Official implementation of 🛸 "UFO: A Unified Approach to Fine-grained Visual Perception via Open-ended Langu…☆262Updated 2 months ago
- OV-DINO: Unified Open-Vocabulary Detection with Language-Aware Selective Fusion☆384Updated 9 months ago
- [CVPR'25] Official repo of "Point2RBox-v2:Rethinking Point-supervised Oriented Object Detection with Spatial Layout Among Instances"☆40Updated 5 months ago
- (CVPR 2025 highlight✨) Official repository of paper "LLMDet: Learning Strong Open-Vocabulary Object Detectors under the Supervision of La…☆530Updated 2 weeks ago
- 这是一个不基于任何框架实现的从0到1的VLM finetune(包括Pre-train和SFT)☆35Updated 4 months ago
- [AAAI2026] X-SAM: From Segment Anything to Any Segmentation☆341Updated last month
- Implementation of paper - DEYO: DETR with YOLO for End-to-End Object Detection☆101Updated last year
- Make Large Multimodal Models excel in object detection, ICCV 2025☆61Updated 5 months ago
- InstaGen: Enhancing Object Detection by Training on Synthetic Dataset, CVPR2024☆87Updated last year
- Official repo of Griffon series including v1(ECCV 2024), v2(ICCV 2025), G, and R, and also the RL tool Vision-R1.☆247Updated 4 months ago
- ☆98Updated last week
- [NeurIPS-W 2025] Official Implementation of "Seg-R1: Segmentation Can Be Surprisingly Simple with Reinforcement Learning"☆57Updated 6 months ago