[CVPR2026] Detect Anything via Next Point Prediction
☆1,320Feb 22, 2026Updated 2 months ago
Alternatives and similar repositories for Rex-Omni
Users that are interested in Rex-Omni are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- [ECCV2024] API code for T-Rex2: Towards Generic Object Detection via Text-Visual Prompt Synergy☆2,658Oct 15, 2025Updated 6 months ago
- OV-DINO: Unified Open-Vocabulary Detection with Language-Aware Selective Fusion☆398Mar 12, 2025Updated last year
- YOLOE: Real-Time Seeing Anything [ICCV 2025]☆2,121Jun 26, 2025Updated 10 months ago
- VLM-FO1: Bridging the Gap Between High-Level Reasoning and Fine-Grained Perception in VLMs☆296Mar 12, 2026Updated last month
- pytorch implementation of "Efficiently Reconstructing Dynamic Scenes One 🎯 D4RT at a Time"☆54Jan 27, 2026Updated 3 months ago
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- [ICLR-2026] Rex-Thinker: Grounded Object Refering via Chain-of-Thought Reasoning☆148Jun 30, 2025Updated 10 months ago
- [CVPR 2024] Real-Time Open-Vocabulary Object Detection☆6,322Feb 26, 2025Updated last year
- [AAAI 2026 Oral] LENS: Learning to Segment Anything with Unified Reinforced Reasoning☆122Dec 3, 2025Updated 4 months ago
- Project Page For "Seg-Zero: Reasoning-Chain Guided Segmentation via Cognitive Reinforcement"☆625Jan 17, 2026Updated 3 months ago
- [ICCV2025] Referring any person or objects given a natural language description. Code base for RexSeek and HumanRef Benchmark☆182Oct 15, 2025Updated 6 months ago
- DINO-X: The World's Top-Performing Vision Model for Open-World Object Detection and Understanding☆1,366Jul 23, 2025Updated 9 months ago
- Official repository of 'Visual-RFT: Visual Reinforcement Fine-Tuning' & 'Visual-ARFT: Visual Agentic Reinforcement Fine-Tuning'’☆2,246Oct 29, 2025Updated 6 months ago
- Grounded SAM: Marrying Grounding DINO with Segment Anything & Stable Diffusion & Recognize Anything - Automatically Detect , Segment and …☆17,543Sep 5, 2024Updated last year
- [DEIMv2] Real Time Object Detection Meets DINOv3☆1,722Mar 24, 2026Updated last month
- Deploy open-source AI quickly and easily - Special Bonus Offer • AdRunpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
- Solve Visual Understanding with Reinforced VLMs☆5,950Mar 12, 2026Updated last month
- Reference PyTorch implementation and models for DINOv3☆10,260Mar 30, 2026Updated last month
- Official repository for "AM-RADIO: Reduce All Domains Into One"☆1,774Apr 21, 2026Updated last week
- [CVPR 2024] Official RT-DETR (RTDETR paddle pytorch), Real-Time DEtection TRansformer, DETRs Beat YOLOs on Real-time Object Detection. 🔥…☆5,131Mar 2, 2026Updated last month
- State-of-the-art Image & Video CLIP, Multimodal Large Language Models, and More!☆2,260Apr 13, 2026Updated 2 weeks ago
- [CVPR 2024] Official implementation of the paper "Visual In-context Learning"☆532Apr 8, 2024Updated 2 years ago
- [CVPR2024 Highlight]GLEE: General Object Foundation Model for Images and Videos at Scale☆1,172Oct 21, 2024Updated last year
- [ECCV 2024] Official implementation of the paper "Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection"☆10,056Aug 12, 2024Updated last year
- [NeurIPS2025 Spotlight 🔥 ] Official implementation of 🛸 "UFO: A Unified Approach to Fine-grained Visual Perception via Open-ended Langu…☆273Nov 5, 2025Updated 5 months ago
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- Official Implementation of CVPR24 highlight paper: Matching Anything by Segmenting Anything☆1,371May 1, 2025Updated last year
- Qwen3-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.☆19,075Jan 30, 2026Updated 3 months ago
- ☆10Feb 14, 2025Updated last year
- [CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多模态对话模型☆10,003Sep 22, 2025Updated 7 months ago
- Includes the code for training and testing the CountGD model from the paper CountGD: Multi-Modal Open-World Counting.☆315Jun 25, 2025Updated 10 months ago
- (CVPR 2026) Official repository of paper "WeDetect: Fast Open-Vocabulary Object Detection as Retrieval"☆189Apr 14, 2026Updated 2 weeks ago
- Cambrian-1 is a family of multimodal LLMs with a vision-centric design.☆1,995Nov 7, 2025Updated 5 months ago
- [CVPR2025] Project for "HyperSeg: Towards Universal Visual Segmentation with Large Language Model".☆180Dec 13, 2024Updated last year
- Seed1.5-VL, a vision-language foundation model designed to advance general-purpose multimodal understanding and reasoning, achieving stat…☆1,575Jun 14, 2025Updated 10 months ago
- Serverless GPU API endpoints on Runpod - Get Bonus Credits • AdSkip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
- Open-source and strong foundation image recognition models.☆3,624Feb 18, 2025Updated last year
- Official code for "No time to train! Training-Free Reference-Based Instance Segmentation"☆301Apr 14, 2026Updated 2 weeks ago
- Official implementation of GeCo2 (AAAI 2026) -- Generalized-Scale Object Counting with Gradual Query Aggregation☆131Apr 13, 2026Updated 2 weeks ago
- Fully Open Framework for Democratized Multimodal Training☆795Dec 27, 2025Updated 4 months ago
- Efficient Track Anything☆798Jan 6, 2025Updated last year
- [NeurIPS 2025] Official code implementation of Perception R1: Pioneering Perception Policy with Reinforcement Learning