showlab/FocusUI

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/showlab/FocusUI)

showlab / FocusUI

[CVPR 2026] FocusUI: Efficient UI Grounding via Position-Preserving Visual Token Selection

☆35

Alternatives and similar repositories for FocusUI

Users that are interested in FocusUI are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

video-reality-test / video-reality-test
View on GitHub
☆23May 5, 2026Updated 2 months ago
showlab / showui-pi
View on GitHub
[CVPR 2026] ShowUI-π: Flow-based Generative Models as GUI Dexterous Hands
☆128Apr 22, 2026Updated 2 months ago
showlab / Adv-GRPO
View on GitHub
[CVPR 2026] An official implementation of Adv-GRPO. The Image as Its Own Reward: Reinforcement Learning with Adversarial Reward for Image…
☆88Feb 26, 2026Updated 4 months ago
showlab / EVOLVE-VLA
View on GitHub
EVOLVE-VLA: Test-Time Training from Environment Feedback for Vision-Language-Action Models
☆87Dec 17, 2025Updated 7 months ago
showlab / AUI
View on GitHub
Computer-Use Agents as Judges for Generative UI
☆44Nov 27, 2025Updated 7 months ago
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
microsoft / GUI-Actor
View on GitHub
[NeurIPS'25] GUI-Actor: Coordinate-Free Visual Grounding for GUI Agents
☆410Apr 13, 2026Updated 3 months ago
showlab / EvolveDirector
View on GitHub
[NeurIPS 2024] EvolveDirector: Approaching Advanced Text-to-Image Generation with Large Vision-Language Models.
☆52Oct 14, 2024Updated last year
nusnlp / d2vlm
View on GitHub
[ICCV 2025] Factorized Learning for Temporally Grounded Video-Language Models
☆24Apr 18, 2026Updated 3 months ago
showlab / FQGAN
View on GitHub
FQGAN: Factorized Visual Tokenization and Generation
☆59Mar 29, 2025Updated last year
showlab / ROICtrl
View on GitHub
Code for [CVPR 2025] ROICtrl: Boosting Instance Control for Visual Generation
☆110Apr 16, 2025Updated last year
showlab / GUI-Narrator
View on GitHub
Repository of GUI Action Narrator
☆13Apr 8, 2025Updated last year
TongUI-agent / TongUI-agent
View on GitHub
[AAAI 2026]Release of code, datasets and model for our work TongUI: Internet-Scale Trajectories from Multimodal Web Tutorials for General…
☆114Dec 1, 2025Updated 7 months ago
nusnlp / SlideTailor
View on GitHub
[AAAI 2026] SlideTailor: Personalized Presentation Slide Generation for Scientific Papers
☆57Apr 18, 2026Updated 3 months ago
showlab / HOSNeRF
View on GitHub
This is the project page for the HOSNeRF
☆16Dec 11, 2023Updated 2 years ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
showlab / SMS
View on GitHub
[ICCV 2025] Balanced Image Stylization with Style Matching Score
☆69Mar 9, 2026Updated 4 months ago
showlab / MakeAnything
View on GitHub
Official code of "MakeAnything: Harnessing Diffusion Transformers for Multi-Domain Procedural Sequence Generation"
☆211Apr 1, 2025Updated last year
showlab / DoraCycle
View on GitHub
DoraCycle: Domain-Oriented Adaptation of Unified Generative Model in Multimodal Cycles
☆31Mar 8, 2026Updated 4 months ago
showlab / GEB-Plus
View on GitHub
[ECCV 2022] GEB+: A Benchmark for Generic Event Boundary Captioning, Grounding and Retrieval
☆17Aug 24, 2022Updated 3 years ago
showlab / TPDiff
View on GitHub
TPDiff: Temporal Pyramid Video Diffusion Model
☆25Mar 13, 2025Updated last year
tiangeluo / RegionFocus
View on GitHub
A simple visual test-time scaling method for GUI agent grounding
☆26Dec 7, 2025Updated 7 months ago
Lexiang-Xiong / CAD
View on GitHub
[ECCV 2026] Anatomy of a Lie: A Multi-Stage Diagnostic Framework for Tracing Hallucinations in Vision-Language Models
☆28Jun 20, 2026Updated 3 weeks ago
penghao-wu / GUI_Reflection
View on GitHub
☆34Sep 19, 2025Updated 10 months ago
Zeyu1226-mt / LLM-IAVC
View on GitHub
[ICCV 2025] "Player-Centric Multimodal Prompt Generation for Large Language Model Based Identity-Aware Basketball Video Captioning".
☆23Dec 11, 2025Updated 7 months ago
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
UCSC-VLAA / VLAA-GUI
View on GitHub
Official implementation of VLAA-GUI series
☆34Jun 20, 2026Updated last month
showlab / MovieSeq
View on GitHub
[ECCV 2024] Learning Video Context as Interleaved Multimodal Sequences
☆46Mar 11, 2025Updated last year
penincillin / SDF_ihmr
View on GitHub
Adapted signed distance function (SDF) for detecting collisions between 3D interacting hands.
☆15Mar 16, 2022Updated 4 years ago
showlab / Q2A
View on GitHub
[ECCV 2022] AssistQ: Affordance-centric Question-driven Task Completion for Egocentric Assistant
☆23Jan 30, 2026Updated 5 months ago
YXB-NKU / SE-GUI
View on GitHub
[NeurIPS 2025]"Enhancing Visual Grounding for GUI Agents via Self-Evolutionary Reinforcement Learning"
☆107Oct 21, 2025Updated 8 months ago
showlab / VideoLISA
View on GitHub
[NeurlPS 2024] One Token to Seg Them All: Language Instructed Reasoning Segmentation in Videos
☆148Dec 26, 2024Updated last year
alchemistyzz / PeRL
View on GitHub
[NeurIPS'25] The official code of "PeRL: Permutation-Enhanced Reinforcement Learning for Interleaved Vision-Language Reasoning"
☆30Mar 30, 2026Updated 3 months ago
BaohaoLiao / SAGE
View on GitHub
Self-Hinting Language Models Enhance Reinforcement Learning
☆26Mar 28, 2026Updated 3 months ago
showlab / Kiwi-Edit
View on GitHub
A unified and fully open-source framework for instruction-guided and reference-guided video editing using natural language.
☆304May 13, 2026Updated 2 months ago
Bare Metal GPUs on DigitalOcean Gradient AI • Ad
Purpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
lose4578 / CircleRoPE
View on GitHub
☆15Sep 1, 2025Updated 10 months ago
qiujihao19 / LongVideo-R1
View on GitHub
[CVPR 2026] LongVideo-R1: Smart Navigation for Low-cost Long Video Understanding
☆50Jul 7, 2026Updated last week
fscdc / dVoting
View on GitHub
[arXiv 2026] dVoting: Fast Voting for dLLMs
☆30Feb 13, 2026Updated 5 months ago
NVlabs / AnyFlow
View on GitHub
Flow Map OPD for AnyStep Video Diffusion
☆394May 23, 2026Updated last month
WenyiWU0111 / CoMEM-Agent
View on GitHub
Official repository for paper Auto-scaling Continuous Memory for GUI Agent
☆29Feb 2, 2026Updated 5 months ago
huggingface / screensuite
View on GitHub
ScreenSuite - The most comprehensive benchmarking suite for GUI Agents!
☆144May 26, 2026Updated last month
vivo / DiMo-GUI
View on GitHub
[EMNLP 2025]Repository for paper "DiMo-GUI: Advancing Test-time Scaling in GUI Grounding via Modality-Aware Visual Reasoning"
☆30Jul 2, 2025Updated last year