🔮 UniPixel: Unified Object Referring and Segmentation for Pixel-Level Visual Reasoning (NeurIPS 2025)
☆229Jan 4, 2026Updated last month
Alternatives and similar repositories for UniPixel
Users that are interested in UniPixel are comparing it to the libraries listed below
Sorting:
- [ICCV 2025] Official implementation of the paper: "Dynamic-DINO: Fine-Grained Mixture of Experts Tuning for Real-time Open-Vocabulary Obj…☆75Jul 29, 2025Updated 7 months ago
- OpenSeg-R: Improving Open-Vocabulary Segmentation via Step-by-Step Visual Reasoning☆27May 24, 2025Updated 9 months ago
- ☆68Nov 5, 2025Updated 3 months ago
- Code for paper: Reinforced Vision Perception with Tools☆71Oct 3, 2025Updated 4 months ago
- [NeurIPS'25] Time-R1: Post-Training Large Vision Language Model for Temporal Video Grounding☆79Dec 14, 2025Updated 2 months ago
- 🚀 Official code for “XStreamVGGT: Extremely Memory-Efficient Streaming Vision Geometry Grounded Transformer with KV Cache Compression”, …☆35Jan 27, 2026Updated last month
- SMART-ELE智慧数字孪生变电站☆25Aug 10, 2025Updated 6 months ago
- 🧠 VideoMind: A Chain-of-LoRA Agent for Temporal-Grounded Video Reasoning (ICLR 2026)☆305Feb 8, 2026Updated 3 weeks ago
- The Code of SiCL☆18Nov 5, 2024Updated last year
- [ECCV2024]FALIP: Visual Prompt as Foveal Attention Boosts CLIP Zero-Shot Performance☆17Sep 11, 2024Updated last year
- ☆14Oct 30, 2023Updated 2 years ago
- Math-VR Benchmark & CodePlot-CoT: Mathematical Visual Reasoning by Thinking with Code-Driven Images☆53Nov 4, 2025Updated 3 months ago
- The official GitHub page for the survey paper "A Survey on LLM Symbolic Reasoning". And this paper is under review.☆23Feb 15, 2026Updated 2 weeks ago
- [ICLR 2026] Official implementation of "Patch-as-Decodable-Token: Towards Unified Multi-Modal Vision Tasks in MLLMs"☆249Oct 31, 2025Updated 4 months ago
- Masked Autoencoders for Unsupervised Anomaly Detection in Medical Images☆20Aug 15, 2023Updated 2 years ago
- ☆113Feb 13, 2026Updated 2 weeks ago
- [AAAI 2025] Explore In-Context Segmentation via Latent Diffusion Models☆22Mar 25, 2025Updated 11 months ago
- ☆29Jun 18, 2025Updated 8 months ago
- [ICCV 2025] Official implementation of "InstructSeg: Unifying Instructed Visual Segmentation with Multi-modal Large Language Models"☆53Feb 10, 2025Updated last year
- ☆22Jul 15, 2024Updated last year
- New generation of CLIP with fine grained discrimination capability, ICML2025☆550Oct 27, 2025Updated 4 months ago
- [ECCV2024] This is an official implementation for "PSALM: Pixelwise SegmentAtion with Large Multi-Modal Model"☆269Dec 30, 2024Updated last year
- (CVPR 2026) Official repository of paper "WeDetect: Fast Open-Vocabulary Object Detection as Retrieval"☆127Feb 21, 2026Updated last week
- This is a official code repository of ROS-SAM☆68Apr 15, 2025Updated 10 months ago
- Official Repo For Pixel-LLM Codebase☆1,543Jan 23, 2026Updated last month
- Implementation of YOLO and IOU tracker in C++☆18Dec 20, 2021Updated 4 years ago
- [NeurIPS2025 Spotlight 🔥 ] Official implementation of 🛸 "UFO: A Unified Approach to Fine-grained Visual Perception via Open-ended Langu…☆268Nov 5, 2025Updated 3 months ago
- [ECCV 2024] SegVG: Transferring Object Bounding Box to Segmentation for Visual Grounding☆64Oct 22, 2024Updated last year
- [CVPR 2025] LLaVA-ST: A Multimodal Large Language Model for Fine-Grained Spatial-Temporal Understanding☆81Jul 4, 2025Updated 7 months ago
- Vision and Language Reference Prompt into SAM for Few-shot Segmentation☆29Apr 8, 2025Updated 10 months ago
- This repo aims to include materials (papers, codes, slides) about SAM2 (segment anything in images and videos). We are continuously impro…☆142Oct 1, 2025Updated 5 months ago
- Code release for "SegLLM: Multi-round Reasoning Segmentation"☆127Feb 20, 2025Updated last year
- [CVPR 2026] TimeLens: Rethinking Video Temporal Grounding with Multimodal LLMs☆103Feb 22, 2026Updated last week
- Video Chain of Thought, Codes for ICML 2024 paper: "Video-of-Thought: Step-by-Step Video Reasoning from Perception to Cognition"☆180Feb 25, 2025Updated last year
- Official Repo of ViscoNet: Bridging and Harmonizing Visual and Textual Conditioning for ControlNet☆30Oct 17, 2024Updated last year
- [CVPR 2025] Official repository of the paper "Mask-Adapter: The Devil is in the Masks for Open-Vocabulary Segmentation"☆123Oct 23, 2025Updated 4 months ago
- [NeurIPS 2025] Official Repo of Omni-R1: Reinforcement Learning for Omnimodal Reasoning via Two-System Collaboration☆114Dec 3, 2025Updated 2 months ago
- Official implementation of Add-SD: Rational Generation without Manual Reference.☆28Aug 19, 2024Updated last year
- (ICLR 2026) Unveiling Super Experts in Mixture-of-Experts Large Language Models☆37Sep 25, 2025Updated 5 months ago