Code repo for "Read Anywhere Pointed: Layout-aware GUI Screen Reading with Tree-of-Lens Grounding"
☆29Jul 31, 2024Updated last year
Alternatives and similar repositories for Screen-Point-and-Read
Users that are interested in Screen-Point-and-Read are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- ☆32Sep 27, 2024Updated last year
- ☆23Oct 11, 2024Updated last year
- [ICCV 2025] GUIOdyssey is a comprehensive dataset for training and evaluating cross-app navigation agents. GUIOdyssey consists of 8,834 e…☆156Jan 3, 2026Updated 3 months ago
- The official repository of "SmartAgent: Chain-of-User-Thought for Embodied Personalized Agent in Cyber World".☆27Aug 20, 2025Updated 8 months ago
- [WWW2024 Oral] Harnessing Multi-Role Capabilities of Large Language Models for Open-Domain Question Answering☆15Apr 22, 2025Updated last year
- Serverless GPU API endpoints on Runpod - Get Bonus Credits • AdSkip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
- [NeurIPS 2024] The official implementation of "Image Copy Detection for Diffusion Models"☆18Oct 1, 2024Updated last year
- The dataset includes widget captions that describes UI element's functionalities. It is used for training and evaluation of the widget ca…☆23Jun 24, 2021Updated 4 years ago
- VisionDroid☆22Apr 2, 2024Updated 2 years ago
- Dataset for Bilingual VLN☆11Dec 5, 2020Updated 5 years ago
- CVPR25☆28Jul 2, 2025Updated 9 months ago
- [ECCV 2022] Official pytorch implementation of the paper "FedVLN: Privacy-preserving Federated Vision-and-Language Navigation"☆13Oct 8, 2022Updated 3 years ago
- ORES: Open-vocabulary Responsible Visual Synthesis☆14Dec 12, 2023Updated 2 years ago
- ☆17Oct 31, 2023Updated 2 years ago
- Official repo for "Imagination-Augmented Natural Language Understanding", NAACL 2022.☆17Aug 30, 2022Updated 3 years ago
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- The dataset includes screen summaries that describes Android app screenshot's functionalities. It is used for training and evaluation of …☆67Jul 27, 2021Updated 4 years ago
- The results and code of our IEEE TCYB 2022 paper, titled "Global-and-Local Collaborative Learning for Co-Salient Object Detection"☆13May 2, 2022Updated 3 years ago
- Official implementation for ICLR 2023 paper Consolidator: Mergeable Adapter with Grouped Connections for Visual Adaptation☆16Jan 23, 2024Updated 2 years ago
- A hot-pluggable tool for visualizing LLaVA's attention.☆24Jan 29, 2024Updated 2 years ago
- ☆18Sep 10, 2025Updated 7 months ago
- GUIEvalKit: Open-source Evaluation Toolkit for GUI Agents☆19Feb 26, 2026Updated 2 months ago
- Dataset and models for paper "Game-Based Video-Context Dialogue (EMNLP 2018)"☆19Oct 25, 2018Updated 7 years ago
- Official implementation for "Android in the Zoo: Chain-of-Action-Thought for GUI Agents" (Findings of EMNLP 2024)☆101Oct 14, 2024Updated last year
- Public code repo for EMNLP 2024 Findings paper "MACAROON: Training Vision-Language Models To Be Your Engaged Partners"☆14Sep 28, 2024Updated last year
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- Environments, tools, and benchmarks for general computer agents☆15Dec 3, 2024Updated last year
- Tools for working with the S800 corpus☆12Sep 17, 2020Updated 5 years ago
- Benchmarking Mobile Device Control Agents across Diverse Configurations (ICLR 2024 workshop GenAI4DM spotlight presentation; CoLLAs 2025)☆35Jul 21, 2025Updated 9 months ago
- Generative Reranker PyTerrier☆18Dec 1, 2025Updated 4 months ago
- Data Release for VALUE Benchmark☆30Feb 16, 2022Updated 4 years ago
- A one-stop library to standardize the inference and evaluation of all the conditional video generation models.☆51Feb 13, 2025Updated last year
- Official Pytorch codebase for Open-Vocabulary Instance Segmentation without Manual Mask Annotations [CVPR 2023]☆53Oct 26, 2025Updated 6 months ago
- SotA text-only image/video method (IJCAI 2023)☆15Jan 9, 2024Updated 2 years ago
- ☆11Sep 18, 2017Updated 8 years ago
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- ☆20Dec 15, 2025Updated 4 months ago
- ☆46Mar 19, 2024Updated 2 years ago
- A curated collection of resources, tools, and frameworks for developing GUI Agents.☆409Apr 16, 2026Updated 2 weeks ago
- The model, data and code for the visual GUI Agent SeeClick☆478Jul 13, 2025Updated 9 months ago
- [NeurIPS 2024] Calibrated Self-Rewarding Vision Language Models☆88Oct 26, 2025Updated 6 months ago
- This repository contains code for the paper RMM: A Recursive Mental Model for Dialog Navigation☆10Nov 22, 2022Updated 3 years ago
- LlamaTouch: A Faithful and Scalable Testbed for Mobile UI Task Automation☆67Aug 9, 2024Updated last year