IDEA-Research / Rex-OmniView external linksLinks
Detect Anything via Next Point Prediction (Based on Qwen2.5-VL-3B)
☆1,128Jan 25, 2026Updated 2 weeks ago
Alternatives and similar repositories for Rex-Omni
Users that are interested in Rex-Omni are comparing it to the libraries listed below
Sorting:
- [ECCV2024] API code for T-Rex2: Towards Generic Object Detection via Text-Visual Prompt Synergy☆2,630Oct 15, 2025Updated 3 months ago
- YOLOE: Real-Time Seeing Anything [ICCV 2025]☆2,029Jun 26, 2025Updated 7 months ago
- Rex-Thinker: Grounded Object Refering via Chain-of-Thought Reasoning☆142Jun 30, 2025Updated 7 months ago
- OV-DINO: Unified Open-Vocabulary Detection with Language-Aware Selective Fusion☆398Mar 12, 2025Updated 11 months ago
- VLM-FO1: Bridging the Gap Between High-Level Reasoning and Fine-Grained Perception in VLMs☆237Nov 28, 2025Updated 2 months ago
- Project Page For "Seg-Zero: Reasoning-Chain Guided Segmentation via Cognitive Reinforcement"☆597Jan 17, 2026Updated 3 weeks ago
- [ICCV2025] Referring any person or objects given a natural language description. Code base for RexSeek and HumanRef Benchmark☆177Oct 15, 2025Updated 3 months ago
- [CVPR 2024] Real-Time Open-Vocabulary Object Detection☆6,208Feb 26, 2025Updated 11 months ago
- Official repository of 'Visual-RFT: Visual Reinforcement Fine-Tuning' & 'Visual-ARFT: Visual Agentic Reinforcement Fine-Tuning'’☆2,316Oct 29, 2025Updated 3 months ago
- DINO-X: The World's Top-Performing Vision Model for Open-World Object Detection and Understanding☆1,334Jul 23, 2025Updated 6 months ago
- Solve Visual Understanding with Reinforced VLMs☆5,833Oct 21, 2025Updated 3 months ago
- Code for ChatRex: Taming Multimodal LLM for Joint Perception and Understanding☆210Oct 15, 2025Updated 3 months ago
- ☆35Sep 29, 2025Updated 4 months ago
- [CVPR 2024] Official implementation of the paper "Visual In-context Learning"☆529Apr 8, 2024Updated last year
- [CVPR 2024] Official RT-DETR (RTDETR paddle pytorch), Real-Time DEtection TRansformer, DETRs Beat YOLOs on Real-time Object Detection. 🔥…☆4,841Dec 3, 2025Updated 2 months ago
- Grounded SAM: Marrying Grounding DINO with Segment Anything & Stable Diffusion & Recognize Anything - Automatically Detect , Segment and …☆17,397Sep 5, 2024Updated last year
- Reference PyTorch implementation and models for DINOv3☆9,525Nov 20, 2025Updated 2 months ago
- ☆10Feb 14, 2025Updated last year
- Official repository for "AM-RADIO: Reduce All Domains Into One"☆1,634Updated this week
- [DEIMv2] Real Time Object Detection Meets DINOv3☆1,478Jan 7, 2026Updated last month
- [CVPR2024 Highlight]GLEE: General Object Foundation Model for Images and Videos at Scale☆1,169Oct 21, 2024Updated last year
- Fully Open Framework for Democratized Multimodal Training☆718Dec 27, 2025Updated last month
- [CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多模态对话模型☆9,792Sep 22, 2025Updated 4 months ago
- [ECCV 2024] Official implementation of the paper "Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection"☆9,694Aug 12, 2024Updated last year
- UMatcher: A modern template matching model☆78May 31, 2025Updated 8 months ago
- Efficient Track Anything☆776Jan 6, 2025Updated last year
- D-FINE: Redefine Regression Task of DETRs as Fine-grained Distribution Refinement [ICLR 2025 Spotlight]☆3,008Jan 5, 2026Updated last month
- Qwen3-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.☆18,273Jan 30, 2026Updated 2 weeks ago
- [CVPR 2025] DEIM: DETR with Improved Matching for Fast Convergence☆1,428Sep 26, 2025Updated 4 months ago
- New generation of CLIP with fine grained discrimination capability, ICML2025☆545Oct 27, 2025Updated 3 months ago
- [ICCV 2025] Implementation for Describe Anything: Detailed Localized Image and Video Captioning☆1,448Jun 26, 2025Updated 7 months ago
- ☆80Jan 18, 2026Updated 3 weeks ago
- Seed1.5-VL, a vision-language foundation model designed to advance general-purpose multimodal understanding and reasoning, achieving stat…☆1,544Jun 14, 2025Updated 8 months ago
- https://huggingface.co/datasets/multimodal-reasoning-lab/Zebra-CoT☆123Jan 30, 2026Updated 2 weeks ago
- Official Implementation of CVPR24 highlight paper: Matching Anything by Segmenting Anything☆1,362May 1, 2025Updated 9 months ago
- Open-source and strong foundation image recognition models.☆3,589Feb 18, 2025Updated 11 months ago
- [AAAI 2026] Official implementation of the paper ”SegDINO3D: 3D Instance Segmentation Empowered by Both Image-Level and Object-Level 2D F…☆24Jan 8, 2026Updated last month
- Cambrian-1 is a family of multimodal LLMs with a vision-centric design.☆1,985Nov 7, 2025Updated 3 months ago
- A Toolkit to Help Optimize Large Onnx Model☆164Oct 26, 2025Updated 3 months ago