Haochen-Wang409/Grasp-Any-Region

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/Haochen-Wang409/Grasp-Any-Region)

Haochen-Wang409 / Grasp-Any-Region

[ICLR'26] Grasp Any Region: Towards Precise, Contextual Pixel Understanding for Multimodal LLMs

☆97

Alternatives and similar repositories for Grasp-Any-Region

Users that are interested in Grasp-Any-Region are comparing it to the libraries listed below

Sorting:

VoyageWang / VG-Refiner
View on GitHub
The repository of VG-Refiner paper
☆17Dec 9, 2025Updated 2 months ago
zsxkib / ST-MFNet
View on GitHub
[IEEE/CVF CVPR'2022] "ST-MFNet: A Spatio-Temporal Multi-Flow Network for Frame Interpolation", Duolikun Danier, Fan Zhang, David Bull
☆13Oct 9, 2023Updated 2 years ago
LouisFinner / HiM2SAM
View on GitHub
This is the official implementation of work HiM2SAM in PRCV25.
☆25Aug 30, 2025Updated 6 months ago
Apollo-Level2-Web-Dev / B6A1
View on GitHub
☆28Nov 17, 2025Updated 3 months ago
SkyworkAI / DAQ-VS
View on GitHub
Code For Our Work: DVIS-DAQ: Improving Video Segmentation via Dynamic Anchor Queries [ECCV-2024]
☆14Jul 11, 2024Updated last year
intellegix / intellegix-code-agent-toolkit
View on GitHub
Automated loop driver, slash commands, council automation, MCP browser bridge, and portfolio governance for Claude Code CLI
☆52Feb 26, 2026Updated last week
benjcooley / dungeongen
View on GitHub
Dungeon procedural generator similar to whatabou's "One Page Dungeon"
☆50Jan 4, 2026Updated 2 months ago
R3gm / ConversaDocs
View on GitHub
Program that enables seamless interaction with your documents through an advanced vector database and the power of Large Language Model (…
☆18Sep 12, 2023Updated 2 years ago
zhouyiks / CoLVA
View on GitHub
☆43Jul 9, 2025Updated 7 months ago
longmalongma / TW-GRPO
View on GitHub
The official repository of our paper "Reinforcing Video Reasoning with Focused Thinking"
☆35Jun 12, 2025Updated 8 months ago
ai-forever / VIBE
View on GitHub
☆49Feb 9, 2026Updated 3 weeks ago
cwlee00 / MFP
View on GitHub
[CVPR 2024] MFP: Making Full Use of Probability Maps for Interactive Image Segmentation
☆17Jul 8, 2024Updated last year
laulampaul / text-animator
View on GitHub
☆20Jun 26, 2024Updated last year
XJTLUSURF20240123 / EmoMA-Net
View on GitHub
☆11Sep 12, 2025Updated 5 months ago
lxtGH / DenseWorld-1M
View on GitHub
Code and dataset link for "DenseWorld-1M: Towards Detailed Dense Grounded Caption in the Real World"
☆127Oct 2, 2025Updated 5 months ago
OvidijusParsiunas / web-llm
View on GitHub
Bringing large-language models and chat to web browsers. Everything runs inside the browser with no server support.
☆16Feb 4, 2024Updated 2 years ago
sdbds / florence2-ft-advanced
View on GitHub
finetune your florence2 model easy
☆21Jul 27, 2024Updated last year
ViTAE-Transformer / SAMText
View on GitHub
The official repo for the technical report "Scalable Mask Annotation for Video Text Spotting"
☆16May 3, 2023Updated 2 years ago
kohya-ss / HunyuanVideo
View on GitHub
HunyuanVideo: A Systematic Framework For Large Video Generation Model
☆48Dec 14, 2024Updated last year
paintscene4d / paintscene4d.github.io
View on GitHub
☆25Mar 30, 2025Updated 11 months ago
MacavityT / REF-VLM
View on GitHub
☆30Jan 18, 2026Updated last month
zhang-tao-whu / DVIS_Plus
View on GitHub
☆135Jul 4, 2024Updated last year
Apple-jun / FilmComposer
View on GitHub
Music production for silent film clips.
☆32Apr 30, 2025Updated 10 months ago
EnVision-Research / TiViBench
View on GitHub
[CVPR 2026] TiViBench: Benchmarking Think-in-Video Reasoning for Video Generative Models
☆64Feb 21, 2026Updated last week
ictnlp / ComSpeech
View on GitHub
Code for ACL 2024 main conference paper "Can We Achieve High-quality Direct Speech-to-Speech Translation Without Parallel Speech Data?".
☆25Jul 2, 2024Updated last year
NOVA-3D-Anime-Character-Synthesis / NOVA-3D
View on GitHub
NOVA-3D: Non-overlapped Views for 3D Anime Character Reconstruction
☆26Mar 14, 2024Updated last year
aigc3d / LAM_WebRender
View on GitHub
A lightweight WebGL Render for LAM and LAM_Audio2Expression
☆51Dec 25, 2025Updated 2 months ago
jaechanjo / TIFF
View on GitHub
Text-Guided Generation of Full-Body Image with Preserved Reference Face for Customized Animation
☆24Jun 24, 2024Updated last year
Lzq5 / UniTime
View on GitHub
Universal Video Temporal Grounding with Generative Multi-modal Large Language Models
☆46Nov 25, 2025Updated 3 months ago
yandex-research / vqdm
View on GitHub
Official repository for VQDM:Accurate Compression of Text-to-Image Diffusion Models via Vector Quantization paper
☆34Sep 17, 2024Updated last year
myyzzzoooo / InsertAnywhere
View on GitHub
InsertAnywhere: Bridging 4D Scene Geometry and Diffusion Models for Realistic Video Object Insertion
☆82Dec 27, 2025Updated 2 months ago
eric-ai-lab / via-video
View on GitHub
☆26Jun 20, 2024Updated last year
qishisuren123 / AnyCap
View on GitHub
A unified framework for controllable caption generation across images, videos, and audio. Supports multi-modal inputs and customizable ca…
☆52Jul 24, 2025Updated 7 months ago
Diffusion-CoT / ReflectionFlow
View on GitHub
[ICCV 2025] Scaling Inference-Time Optimization for Text-to-Image Diffusion Models via Reflection Tuning
☆217Nov 5, 2025Updated 4 months ago
HengLan / CGSTVG
View on GitHub
[CVPR 2024] Context-Guided Spatio-Temporal Video Grounding
☆66Jun 28, 2024Updated last year
lxtGH / TemporalPyramidRouting
View on GitHub
Temporal Pyramid Routing For Video Instance Segmentation-T-PAMI-2022
☆25Jul 6, 2023Updated 2 years ago
Xrvitd / ComboStoc
View on GitHub
Code of ComboStoc, the diffusion models and training/sampling code for our paper exploring the Combinatorial Stochasticity for Diffusion …
☆31May 16, 2025Updated 9 months ago
VolkerMuehlhaus / RFIC-Inductor-Toolkit-Open
View on GitHub
RFIC Inductor Toolkit for ADS, Open Source Version
☆64Aug 28, 2025Updated 6 months ago
ruohaoguo / ovavss
View on GitHub
Official Implementation of "Open-Vocabulary Audio-Visual Semantic Segmentation" [ACM MM 2024 Oral].
☆35Nov 2, 2024Updated last year