🔮 UniPixel: Unified Object Referring and Segmentation for Pixel-Level Visual Reasoning (NeurIPS 2025)
☆237Jan 4, 2026Updated 3 months ago
Alternatives and similar repositories for UniPixel
Users that are interested in UniPixel are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- [ICCV 2025] Official implementation of the paper: "Dynamic-DINO: Fine-Grained Mixture of Experts Tuning for Real-time Open-Vocabulary Obj…☆76Jul 29, 2025Updated 8 months ago
- OpenSeg-R: Improving Open-Vocabulary Segmentation via Step-by-Step Visual Reasoning☆28May 24, 2025Updated 10 months ago
- VisionReasoner: Unified Reasoning-Integrated Visual Perception via Reinforcement Learning☆332Feb 9, 2026Updated 2 months ago
- Code for paper: Reinforced Vision Perception with Tools☆73Oct 3, 2025Updated 6 months ago
- ☆75Nov 5, 2025Updated 5 months ago
- Proton VPN Special Offer - Get 70% off • AdSpecial partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
- [CVPR 2025] LLaVA-ST: A Multimodal Large Language Model for Fine-Grained Spatial-Temporal Understanding☆84Jul 4, 2025Updated 9 months ago
- ☆31Mar 24, 2026Updated 2 weeks ago
- 🧠 VideoMind: A Chain-of-LoRA Agent for Temporal-Grounded Video Reasoning (ICLR 2026)☆318Feb 8, 2026Updated 2 months ago
- Official pytorch implementation for SingleInsert☆28Apr 19, 2024Updated last year
- ☆23Jul 15, 2024Updated last year
- InstructSAM: A Training-Free Framework for Instruction-Oriented Remote Sensing Object Recognition (NeurIPS 2025)☆107Feb 28, 2026Updated last month
- ☆14Oct 30, 2023Updated 2 years ago
- Vision and Language Reference Prompt into SAM for Few-shot Segmentation☆30Apr 8, 2025Updated last year
- [ICCV 2025] Official implementation of "InstructSeg: Unifying Instructed Visual Segmentation with Multi-modal Large Language Models"☆55Feb 10, 2025Updated last year
- Wordpress hosting with auto-scaling on Cloudways • AdFully Managed hosting built for WordPress-powered businesses that need reliable, auto-scalable hosting. Cloudways SafeUpdates now available.
- Differentiable Hierarchical Visual Tokenization☆44Nov 26, 2025Updated 4 months ago
- Visual Grounding with Multi-modal Conditional Adaptation (ACMMM 2024 Oral)☆26Jun 11, 2025Updated 10 months ago
- Official implementation for P2SAM (ACM MM 2024)☆14Dec 7, 2024Updated last year
- (CVPR 2026) Official repository of paper "WeDetect: Fast Open-Vocabulary Object Detection as Retrieval"☆168Feb 21, 2026Updated last month
- [ICLR 2026] Official implementation of "Patch-as-Decodable-Token: Towards Unified Multi-Modal Vision Tasks in MLLMs"☆255Oct 31, 2025Updated 5 months ago
- This is a project on visual spatial reasoning tasks-SIBench☆26Jan 12, 2026Updated 2 months ago
- [NeurIPS 2025]"DynamicVerse: A Physically-Aware Multimodal Framework for 4D World Modeling"☆99Dec 21, 2025Updated 3 months ago
- [AAAI 2025] Explore In-Context Segmentation via Latent Diffusion Models☆22Mar 25, 2025Updated last year
- ☆19Aug 3, 2024Updated last year
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click and start building anything your business needs.
- [NeurIPS'25] Time-R1: Post-Training Large Vision Language Model for Temporal Video Grounding☆88Dec 14, 2025Updated 3 months ago
- ☆37Jan 14, 2025Updated last year
- Open-Vocabulary SAM3D: Understand Any 3D Scene☆41Jun 9, 2025Updated 10 months ago
- [NeurIPS2025 Spotlight 🔥 ] Official implementation of 🛸 "UFO: A Unified Approach to Fine-grained Visual Perception via Open-ended Langu…☆271Nov 5, 2025Updated 5 months ago
- [ICRA 2025] A Parameter-Efficient Tuning Framework for Language-guided Object Grounding and Robot Grasping☆11Feb 7, 2025Updated last year
- 🚀 Official code for “XStreamVGGT: Extremely Memory-Efficient Streaming Vision Geometry Grounded Transformer with KV Cache Compression”, …☆41Jan 27, 2026Updated 2 months ago
- Masked Autoencoders for Unsupervised Anomaly Detection in Medical Images☆21Aug 15, 2023Updated 2 years ago
- This repository collects and organises state‑of‑the‑art papers on spatial reasoning for Multimodal Vision–Language Models (MVLMs).☆296Feb 17, 2026Updated last month
- [AAAI 2025] VTG-LLM: Integrating Timestamp Knowledge into Video LLMs for Enhanced Video Temporal Grounding☆126Dec 10, 2024Updated last year
- Proton VPN Special Offer - Get 70% off • AdSpecial partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
- ☆15Nov 1, 2024Updated last year
- Official implementation of Add-SD: Rational Generation without Manual Reference.☆28Aug 19, 2024Updated last year
- [CVPR 2026] Accelerating Streaming Video Large Language Models via Hierarchical Token Compression☆54Feb 25, 2026Updated last month
- Code release for "SegLLM: Multi-round Reasoning Segmentation"☆128Feb 20, 2025Updated last year
- Official implementation of "Perception-as-Control: Fine-grained Controllable Image Animation with 3D-aware Motion Representation" (ICCV 2…☆81Aug 5, 2025Updated 8 months ago
- Unified Change Detection Framework☆43May 24, 2025Updated 10 months ago
- [AAAI 2026] GenMAC for Compositional Text-to-Video Generation☆32Jan 10, 2026Updated 3 months ago