Rui Qian, Xin Yin, Dejing Douβ : Reasoning to Attend: Try to Understand How <SEG> Token Works (CVPR 2025)
β52Feb 4, 2026Updated last month
Alternatives and similar repositories for READ
Users that are interested in READ are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- π₯ [CVPR 2024] Official implementation of "See, Say, and Segment: Teaching LMMs to Overcome False Premises (SESAME)"β47Jun 16, 2024Updated last year
- [CVPR 2026] STAMP: Better, Stronger, Faster: Tackling the Trilemma in MLLM-based Segmentation with Simultaneous Textual Mask Predictionβ35Feb 21, 2026Updated last month
- The official implementation code for Plug-and-Play PPO: An Adaptive Point Prompt Optimizer Making SAM Greater.β31Jan 28, 2026Updated last month
- A curated list of publications on image and video segmentation leveraging Multimodal Large Language Models (MLLMs), highlighting state-ofβ¦β198Jan 21, 2026Updated 2 months ago
- Video Reasoning Segmentationβ28Nov 29, 2024Updated last year
- Bare Metal GPUs on DigitalOcean Gradient AI β’ AdPurpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
- Code and Datasets for the NeurIPS24 Paper "Scribbles for All: Benchmarking Scribble Supervised Segmentation Across Datasets"β20Feb 16, 2026Updated last month
- [MICCAI 2025] Hierarchical Self-Supervised Adversarial Training for Robust Vision Models in Histopathologyβ12Jun 17, 2025Updated 9 months ago
- code for the paper "CoReS: Orchestrating the Dance of Reasoning and Segmentation"β23Nov 24, 2025Updated 4 months ago
- β19Oct 23, 2024Updated last year
- β14Jul 17, 2024Updated last year
- [ECCV24] VISA: Reasoning Video Object Segmentation via Large Language Modelβ208Aug 5, 2024Updated last year
- [CVPR 2025 π₯]A Large Multimodal Model for Pixel-Level Visual Grounding in Videosβ99Apr 14, 2025Updated 11 months ago
- [NeurIPS 2025] HermesFlow: Seamlessly Closing the Gap in Multimodal Understanding and Generationβ77Sep 19, 2025Updated 6 months ago
- [ECCVW 2024 -- ORAL] Official repository of paper titled "Makeup-Guided Facial Privacy Protection via Untrained Neural Network Priors".β12Oct 11, 2024Updated last year
- Virtual machines for every use case on DigitalOcean β’ AdGet dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
- [ICLR 2025] Official Pytorch Implementation of MMR: A Large-scale Benchmark Dataset for Multi-target and Multi-granularity Reasoning Segmβ¦β25Apr 3, 2025Updated 11 months ago
- β21Mar 18, 2026Updated last week
- [NeurlPS 2024] One Token to Seg Them All: Language Instructed Reasoning Segmentation in Videosβ146Dec 26, 2024Updated last year
- [ICLR 2025] SAFREE: Training-Free and Adaptive Guard for Safe Text-to-Image and Video Generationβ55Jan 22, 2025Updated last year
- [ICLR2023] Video Scene Graph Generation from Single-Frame Weak Supervisionβ12Sep 17, 2023Updated 2 years ago
- β23Jul 15, 2024Updated last year
- High Quality Video Reasoning Segmentationβ148Nov 24, 2025Updated 4 months ago
- [CVPR2025] SegAgent: Exploring Pixel Understanding Capabilities in MLLMs by Imitating Human Annotator Trajectoriesβ94Aug 8, 2025Updated 7 months ago
- [CVPR-25π₯] Test-time Counterattacks (TTC) towards adversarial robustness of CLIPβ40Jun 4, 2025Updated 9 months ago
- Bare Metal GPUs on DigitalOcean Gradient AI β’ AdPurpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
- The PyTorch implementation for "DEAL: Disentangle and Localize Concept-level Explanations for VLMs" (ECCV 2024 Strong Double Blind)β20Mar 9, 2026Updated 2 weeks ago
- [CVPR 2025] Official PyTorch Implementation of GLUS: Global-Local Reasoning Unified into A Single Large Language Model for Video Segmentaβ¦β68Jun 23, 2025Updated 9 months ago
- [CVPR 2024] SHAP-EDITOR: Instruction-guided Latent 3D Editing in Secondsβ36Jul 19, 2025Updated 8 months ago
- Official PyTorch implementation of The Linear Attention Resurrection in Vision Transformerβ16Sep 7, 2024Updated last year
- Code release for "SegLLM: Multi-round Reasoning Segmentation"β128Feb 20, 2025Updated last year
- This is a repository contains the implementation of our NeurIPS'24 paper "Temporal Sentence Grounding with Relevance Feedback in Videos"β14Aug 22, 2025Updated 7 months ago
- [NAACL 2024] Z-GMOT: Zero-shot Generic Multiple Object Trackingβ13May 3, 2024Updated last year
- Code For Our Work: DVIS-DAQ: Improving Video Segmentation via Dynamic Anchor Queries [ECCV-2024]β14Jul 11, 2024Updated last year
- [CVPR'24] Code for Emergent Open-Vocabulary Semantic Segmentation from Off-the-shelf Vision-Language Modelsβ18Jul 22, 2024Updated last year
- Managed hosting for WordPress and PHP on Cloudways β’ AdManaged hosting with the flexibility to host WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Cloudways by DigitalOcean.
- Repo for paper "MUSEG: Reinforcing Video Temporal Understanding via Timestamp-Aware Multi-Segment Grounding".β39Jun 9, 2025Updated 9 months ago
- The official code for paper "Stacking Brick by Brick: Aligned Feature Isolation for Incremental Face Forgery Detection" (CVPR 2025)β26Aug 15, 2025Updated 7 months ago
- [CVPR 2025] Your Large Vision-Language Model Only Needs A Few Attention Heads For Visual Groundingβ17Oct 4, 2025Updated 5 months ago
- ALTo: Adaptive-Length Tokenizer for Autoregressive Mask Generationβ28May 27, 2025Updated 10 months ago
- Seeing Far and Clearly: Mitigating Hallucinations in MLLMs with Attention Causal Decoding (CVPR 2025 Oral)β39Nov 28, 2025Updated 3 months ago
- SpaceVLLM: Endowing Multimodal Large Language Model with Spatio-Temporal Video Grounding Capabilityβ17May 8, 2025Updated 10 months ago
- [CVPR2025] Official Repository for IMMUNE: Improving Safety Against Jailbreaks in Multi-modal LLMs via Inference-Time Alignmentβ27Jun 11, 2025Updated 9 months ago