[CVPR2026] Detect Anything via Next Point Prediction
☆1,199Feb 22, 2026Updated 3 weeks ago
Alternatives and similar repositories for Rex-Omni
Users that are interested in Rex-Omni are comparing it to the libraries listed below
Sorting:
- [ECCV2024] API code for T-Rex2: Towards Generic Object Detection via Text-Visual Prompt Synergy☆2,641Oct 15, 2025Updated 5 months ago
- OV-DINO: Unified Open-Vocabulary Detection with Language-Aware Selective Fusion☆400Mar 12, 2025Updated last year
- YOLOE: Real-Time Seeing Anything [ICCV 2025]☆2,075Jun 26, 2025Updated 8 months ago
- [AAAI 2026 Oral] LENS: Learning to Segment Anything with Unified Reinforced Reasoning☆109Dec 3, 2025Updated 3 months ago
- Rex-Thinker: Grounded Object Refering via Chain-of-Thought Reasoning☆146Jun 30, 2025Updated 8 months ago
- VLM-FO1: Bridging the Gap Between High-Level Reasoning and Fine-Grained Perception in VLMs☆244Mar 12, 2026Updated last week
- [CVPR 2024] Real-Time Open-Vocabulary Object Detection☆6,253Feb 26, 2025Updated last year
- Project Page For "Seg-Zero: Reasoning-Chain Guided Segmentation via Cognitive Reinforcement"☆614Jan 17, 2026Updated 2 months ago
- [ICCV2025] Referring any person or objects given a natural language description. Code base for RexSeek and HumanRef Benchmark☆178Oct 15, 2025Updated 5 months ago
- DINO-X: The World's Top-Performing Vision Model for Open-World Object Detection and Understanding☆1,345Jul 23, 2025Updated 7 months ago
- Official repository of 'Visual-RFT: Visual Reinforcement Fine-Tuning' & 'Visual-ARFT: Visual Agentic Reinforcement Fine-Tuning'’☆2,317Oct 29, 2025Updated 4 months ago
- pytorch implementation of "Efficiently Reconstructing Dynamic Scenes One 🎯 D4RT at a Time"☆48Jan 27, 2026Updated last month
- Grounded SAM: Marrying Grounding DINO with Segment Anything & Stable Diffusion & Recognize Anything - Automatically Detect , Segment and …☆17,464Sep 5, 2024Updated last year
- Solve Visual Understanding with Reinforced VLMs☆5,872Mar 12, 2026Updated last week
- Reference PyTorch implementation and models for DINOv3☆9,878Mar 11, 2026Updated last week
- Official repository for "AM-RADIO: Reduce All Domains Into One"☆1,706Feb 11, 2026Updated last month
- [CVPR 2024] Official implementation of the paper "Visual In-context Learning"☆531Apr 8, 2024Updated last year
- [CVPR2024 Highlight]GLEE: General Object Foundation Model for Images and Videos at Scale☆1,171Oct 21, 2024Updated last year
- State-of-the-art Image & Video CLIP, Multimodal Large Language Models, and More!☆2,204Mar 12, 2026Updated last week
- [DEIMv2] Real Time Object Detection Meets DINOv3☆1,575Jan 7, 2026Updated 2 months ago
- [NeurIPS2025 Spotlight 🔥 ] Official implementation of 🛸 "UFO: A Unified Approach to Fine-grained Visual Perception via Open-ended Langu…☆269Nov 5, 2025Updated 4 months ago
- [CVPR 2024] Official RT-DETR (RTDETR paddle pytorch), Real-Time DEtection TRansformer, DETRs Beat YOLOs on Real-time Object Detection. 🔥…☆4,940Mar 2, 2026Updated 2 weeks ago
- D-FINE: Redefine Regression Task of DETRs as Fine-grained Distribution Refinement [ICLR 2025 Spotlight]☆3,060Jan 5, 2026Updated 2 months ago
- Includes the code for training and testing the CountGD model from the paper CountGD: Multi-Modal Open-World Counting.☆309Jun 25, 2025Updated 8 months ago
- Official Implementation of CVPR24 highlight paper: Matching Anything by Segmenting Anything☆1,365May 1, 2025Updated 10 months ago
- [CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多模态对话模型☆9,904Sep 22, 2025Updated 5 months ago
- [ECCV 2024] Official implementation of the paper "Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection"☆9,867Aug 12, 2024Updated last year
- ☆96Jan 18, 2026Updated 2 months ago
- [CVPR2025] Project for "HyperSeg: Towards Universal Visual Segmentation with Large Language Model".☆180Dec 13, 2024Updated last year
- Qwen3-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.☆18,671Jan 30, 2026Updated last month
- Fully Open Framework for Democratized Multimodal Training☆770Dec 27, 2025Updated 2 months ago
- OmniStream: Mastering Perception, Reconstruction and Action in Continuous Streams☆47Updated this week
- Official code for "No time to train! Training-Free Reference-Based Instance Segmentation"☆295Feb 20, 2026Updated last month
- Efficient Track Anything☆788Jan 6, 2025Updated last year
- Open-source and strong foundation image recognition models.☆3,604Feb 18, 2025Updated last year
- Cambrian-1 is a family of multimodal LLMs with a vision-centric design.☆1,990Nov 7, 2025Updated 4 months ago
- [ICRA 2025] Official repository for "UASTHN: Uncertainty-Aware Deep Homography Estimation for UAV Satellite-Thermal Geo-localization"☆21Feb 28, 2026Updated 3 weeks ago
- Seed1.5-VL, a vision-language foundation model designed to advance general-purpose multimodal understanding and reasoning, achieving stat…☆1,555Jun 14, 2025Updated 9 months ago
- [ICLR2026] Official Implementation of "Dens3R: A Foundation Model for 3D Geometry Prediction"☆371Sep 29, 2025Updated 5 months ago