VLM-FO1: Bridging the Gap Between High-Level Reasoning and Fine-Grained Perception in VLMs
☆244Mar 12, 2026Updated last week
Alternatives and similar repositories for VLM-FO1
Users that are interested in VLM-FO1 are comparing it to the libraries listed below
Sorting:
- [AAAI 2026] Empowering DINO Representations for Underwater Instance Segmentation via Aligner and Prompter☆38Feb 3, 2026Updated last month
- RefDrone: A Challenging Benchmark for Drone Scene Referring Expression Comprehension☆32Dec 23, 2025Updated 2 months ago
- 使用opencv部署yolo11表格检测,它是百度网盘AI大赛-表格检测的第2名方案,方案里包含表格框检测,表格角点检测,表格方向分类,一共三个模块。我依然是编写了C++和Python两个版本的程序☆13Dec 12, 2024Updated last year
- [ECCV 2024] SAM4MLLM: Enhance Multi-Modal Large Language Model for Referring Expression Segmentation☆49Mar 20, 2025Updated last year
- ☆109Aug 14, 2025Updated 7 months ago
- Third place of 2021 IEEE GRSS Data Fusion Contest: Track MSD☆10Mar 31, 2021Updated 4 years ago
- Youtu-VL: Unleashing Visual Potential via Unified Vision-Language Supervision☆143Feb 6, 2026Updated last month
- Official repo of Griffon series including v1(ECCV 2024), v2(ICCV 2025), G, and R, and also the RL tool Vision-R1.☆249Aug 12, 2025Updated 7 months ago
- [CVPR2026] Detect Anything via Next Point Prediction☆1,199Feb 22, 2026Updated 3 weeks ago
- ☆30Jan 18, 2026Updated 2 months ago
- [NeurIPS 2024] OneRef: Unified One-tower Expression Grounding and Segmentation with Mask Referring Modeling.☆31Nov 13, 2025Updated 4 months ago
- Fine-Grained Pixel-Text Alignment for Open-Vocabulary Semantic Segmentation☆15Sep 24, 2025Updated 5 months ago
- [CVPR 2025] DynRefer: Delving into Region-level Multimodal Tasks via Dynamic Resolution☆59Mar 4, 2025Updated last year
- 使用TensorRT推理GroundingDINO,推理速度 提升3倍以上!☆54Oct 17, 2024Updated last year
- A lightweight and real-time DETR for aerial images detection☆44Mar 22, 2025Updated 11 months ago
- 🙌 OpenHands: Code Less, Make More☆11Jan 8, 2025Updated last year
- Demo for Qwen2.5-VL-3B-Instruct on Axera device.☆16Sep 3, 2025Updated 6 months ago
- ☆29Apr 23, 2020Updated 5 years ago
- Spatial Aptitude Training for Multimodal Langauge Models☆24Feb 8, 2026Updated last month
- The official repository of "MarineInst: A Foundation Model for Marine Image Analysis with Instance Visual Description". [ECCV Oral 2024.]☆18Sep 24, 2024Updated last year
- ALTo: Adaptive-Length Tokenizer for Autoregressive Mask Generation☆27May 27, 2025Updated 9 months ago
- ☆12Jul 11, 2025Updated 8 months ago
- ☆10May 16, 2023Updated 2 years ago
- #ICCV, #MoE, #Tracking☆33Jul 11, 2025Updated 8 months ago
- Meta repository for UWslam dataset