Gorilla-Lab-SCUT/PaDT

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/Gorilla-Lab-SCUT/PaDT)

Gorilla-Lab-SCUT / PaDT

[ICLR 2026] Official implementation of "Patch-as-Decodable-Token: Towards Unified Multi-Modal Vision Tasks in MLLMs"

☆162

Alternatives and similar repositories for PaDT

Users that are interested in PaDT are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

PolyU-ChenLab / UniPixel
View on GitHub
🔮 UniPixel: Unified Object Referring and Segmentation for Pixel-Level Visual Reasoning (NeurIPS 2025)
☆247Jan 4, 2026Updated 6 months ago
IDEA-Research / Rex-Omni
View on GitHub
[CVPR2026] Detect Anything via Next Point Prediction
☆1,516Feb 22, 2026Updated 5 months ago
rui-qian / READ
View on GitHub
Rui Qian, Xin Yin, Dejing Dou†: Reasoning to Attend: Try to Understand How <SEG> Token Works (CVPR 2025)
☆54Feb 4, 2026Updated 5 months ago
rui-qian / UGround
View on GitHub
Rui Qian, Xin Yin, Chuanhang Deng, et al.: UGround: Towards Unified Visual Grounding with Unrolled Transformers (ICML 2026)
☆29Jun 18, 2026Updated last month
om-ai-lab / VLM-FO1
View on GitHub
VLM-FO1: Bridging the Gap Between High-Level Reasoning and Fine-Grained Perception in VLMs
☆329Jun 18, 2026Updated last month
Bare Metal GPUs on DigitalOcean Gradient AI • Ad
Purpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
yayafengzi / LMM-HiMTok
View on GitHub
HiMTok: Learning Hierarchical Mask Tokens for Image Segmentation with Large Multimodal Model
☆97Jul 17, 2025Updated last year
JIA-Lab-research / VisionReasoner
View on GitHub
[ICLR 2026] VisionReasoner: Unified Reasoning-Integrated Visual Perception via Reinforcement Learning
☆348Feb 9, 2026Updated 5 months ago
iSEE-Laboratory / LLMDet
View on GitHub
(CVPR 2025 highlight✨) Official repository of paper "LLMDet: Learning Strong Open-Vocabulary Object Detectors under the Supervision of La…
☆607Feb 4, 2026Updated 5 months ago
Hectormxy / OP-SAM
View on GitHub
The official implementation of ICCV 25 OP-SAM "One Polyp Identifies All: One-Shot Polyp Segmentation with SAM via Cascaded Priors and Ite…
☆15Jul 9, 2025Updated last year
JIA-Lab-research / Seg-Zero
View on GitHub
Project Page For "Seg-Zero: Reasoning-Chain Guided Segmentation via Cognitive Reinforcement"
☆635Jan 17, 2026Updated 6 months ago
YuHengsss / SD-RPN
View on GitHub
[ICLR2026] Catching the Details: Self-Distilled RoI Predictors for Fine-Grained MLLM Perception
☆17Jan 26, 2026Updated 5 months ago
yayafengzi / ALToLLM
View on GitHub
ALTo: Adaptive-Length Tokenizer for Autoregressive Mask Generation
☆30May 27, 2025Updated last year
WeChatCV / WeDetect
View on GitHub
(CVPR 2026) Official repository of paper "WeDetect: Fast Open-Vocabulary Object Detection as Retrieval"
☆244Jun 7, 2026Updated last month
Gorilla-Lab-SCUT / TRIBE
View on GitHub
[AAAI 2024] Towards Real-World Test-Time Adaptation: Tri-Net Self-Training with Balanced Normalization
☆30Apr 8, 2025Updated last year
Proton VPN Special Offer - Get 70% off • Ad
Special partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
wanghao9610 / X-SAM
View on GitHub
[AAAI2026] X-SAM: From Segment Anything to Any Segmentation
☆384Jul 14, 2026Updated last week
yinghemedical / U-VLM
View on GitHub
U-VLM: Hierarchical Vision Language Modeling for Report Generation
☆19Apr 30, 2026Updated 2 months ago
HKUST-LongGroup / STAMP
View on GitHub
[CVPR 2026] STAMP: Better, Stronger, Faster: Tackling the Trilemma in MLLM-based Segmentation with Simultaneous Textual Mask Prediction
☆39Feb 21, 2026Updated 5 months ago
PGSmall / ConceptBank
View on GitHub
Official code "Taming SAM3 in the Wild: A Concept Bank for Open-Vocabulary Segmentation"
☆48Mar 3, 2026Updated 4 months ago
nnnth / UFO
View on GitHub
[NeurIPS2025 Spotlight 🔥 ] Official implementation of 🛸 "UFO: A Unified Approach to Fine-grained Visual Perception via Open-ended Langu…
☆281Nov 5, 2025Updated 8 months ago
360CVGroup / LMM-Det
View on GitHub
Make Large Multimodal Models excel in object detection, ICCV 2025
☆65Aug 1, 2025Updated 11 months ago
mc-lan / Text4Seg
View on GitHub
[ICLR2025] Text4Seg: Reimagining Image Segmentation as Text Generation
☆177Nov 8, 2025Updated 8 months ago
WeChatCV / ObjEmbed
View on GitHub
(ICML 2026) Official repository of paper "ObjEmbed: Towards Universal Multimodal Object Embeddings"
☆51May 18, 2026Updated 2 months ago
Gorilla-Lab-SCUT / TTAC2
View on GitHub
[TPAMI 2024] The official implementation of "Revisiting Realistic Test-Time Training: Sequential Inference and Adaptation by Anchored Clu…
☆13Mar 19, 2024Updated 2 years ago
Deploy open-source AI quickly and easily - Special Bonus Offer • Ad
Runpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
zhouyiks / CoLVA
View on GitHub
☆44Jul 9, 2025Updated last year
MSIIP / Connector-S
View on GitHub
☆13Apr 30, 2025Updated last year
DanielSHKao / ThinkFirst
View on GitHub
Official implementation for "Think Before You Segment: High-Quality Reasoning Segmentation with GPT Chain of Thoughts"
☆22Jun 28, 2025Updated last year
Liuziyu77 / Visual-RFT
View on GitHub
Official repository of 'Visual-RFT: Visual Reinforcement Fine-Tuning' & 'Visual-ARFT: Visual Agentic Reinforcement Fine-Tuning'’
☆2,263Oct 29, 2025Updated 8 months ago
ml-research / deictic-segment-anything
View on GitHub
Segment Anything with Deictic Prompting
☆27May 13, 2025Updated last year
Visual-Agent / DeepEyes
View on GitHub
☆1,250Nov 20, 2025Updated 8 months ago
Haochen-Wang409 / Grasp-Any-Region
View on GitHub
[ICLR'26] Grasp Any Region: Towards Precise, Contextual Pixel Understanding for Multimodal LLMs
☆99Jan 26, 2026Updated 5 months ago
JiazuoYu / Fines
View on GitHub
Code for paper "FineRS: Fine-grained Reasoning and Segmentation of Small Objects with Reinforcement Learning" Neurips2025.
☆15Jan 29, 2026Updated 5 months ago
mc-lan / Awesome-MLLM-Segmentation
View on GitHub
A curated list of publications on image and video segmentation leveraging Multimodal Large Language Models (MLLMs), highlighting state-of…
☆231Jun 28, 2026Updated 3 weeks ago
Open source password manager - Proton Pass • Ad
Securely store, share, and autofill your credentials with Proton Pass, the end-to-end encrypted password manager trusted by millions.
zhang-haojie / MuSS
View on GitHub
A Large-Scale Dataset and Cinematic Narrative Benchmark for Multi-Shot Subject-to-Video Generation
☆30Jun 9, 2026Updated last month
TIGER-AI-Lab / Pixel-Reasoner
View on GitHub
Pixel-Level Reasoning Model trained with RL [NeuIPS25]
☆301Jun 4, 2026Updated last month
bytedance / Sa2VA
View on GitHub
Official Repo For Pixel-LLM Codebase: Sa2VA (Arxiv-25), SAMTok (CVPR-26), VRT, SaSaSa2VA (1-st solution for LSVOS)
☆1,650Jun 19, 2026Updated last month
PKU-YuanGroup / UniSandBox
View on GitHub
Does Understanding Inform Generation in Unified Multimodal Models? From Analysis to Path Forward
☆60Nov 27, 2025Updated 7 months ago
debby-0527 / SAM3-I
View on GitHub
Official code and resources for SAM3-I.
☆175Apr 14, 2026Updated 3 months ago
CUHK-AIM-Group / MCPL
View on GitHub
MCPL: Multi-modal Collaborative Prompt Learning for Medical Vision-Language Model (Initial Version)
☆13Apr 17, 2024Updated 2 years ago
Tanveer81 / RGNet
View on GitHub
This is the official implementation of RGNet: A Unified Retrieval and Grounding Network for Long Videos
☆20Mar 3, 2025Updated last year