Code repo for "Read Anywhere Pointed: Layout-aware GUI Screen Reading with Tree-of-Lens Grounding"
☆30May 12, 2026Updated 2 weeks ago
Alternatives and similar repositories for Screen-Point-and-Read
Users that are interested in Screen-Point-and-Read are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- ☆32Sep 27, 2024Updated last year
- [ICCV 2025] GUIOdyssey is a comprehensive dataset for training and evaluating cross-app navigation agents. GUIOdyssey consists of 8,834 e…☆158Jan 3, 2026Updated 4 months ago
- [AAAI2025 Oral] BiDeV: Bilateral Defusing Verification for Complex Claim Fact-Checking☆15Apr 22, 2025Updated last year
- [WWW2024 Oral] Harnessing Multi-Role Capabilities of Large Language Models for Open-Domain Question Answering☆15Apr 22, 2025Updated last year
- [NeurIPS 2024] The official implementation of "Image Copy Detection for Diffusion Models"☆18Oct 1, 2024Updated last year
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- The dataset includes widget captions that describes UI element's functionalities. It is used for training and evaluation of the widget ca…☆23Jun 24, 2021Updated 4 years ago
- VisionDroid☆22Apr 2, 2024Updated 2 years ago
- CVPR25☆28Jul 2, 2025Updated 10 months ago
- [ECCV 2022] Official pytorch implementation of the paper "FedVLN: Privacy-preserving Federated Vision-and-Language Navigation"☆13Oct 8, 2022Updated 3 years ago
- ORES: Open-vocabulary Responsible Visual Synthesis☆14Dec 12, 2023Updated 2 years ago
- Official repo for "Imagination-Augmented Natural Language Understanding", NAACL 2022.☆17Aug 30, 2022Updated 3 years ago
- Official implementation for ICLR 2023 paper Consolidator: Mergeable Adapter with Grouped Connections for Visual Adaptation☆16Jan 23, 2024Updated 2 years ago
- A hot-pluggable tool for visualizing LLaVA's attention.☆24Jan 29, 2024Updated 2 years ago
- The official implementation of AutoGUI.☆17Oct 6, 2025Updated 7 months ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- GUIEvalKit: Open-source Evaluation Toolkit for GUI Agents☆22Feb 26, 2026Updated 3 months ago
- ☆25May 12, 2026Updated 2 weeks ago
- Code for the EMNLP 2021 Oral paper "Are Gender-Neutral Queries Really Gender-Neutral? Mitigating Gender Bias in Image Search" https://arx…☆12Feb 6, 2023Updated 3 years ago
- Implementation of "Multimodal Text Style Transfer for Outdoor Vision-and-Language Navigation"☆27Mar 4, 2021Updated 5 years ago
- Official implementation for "Android in the Zoo: Chain-of-Action-Thought for GUI Agents" (Findings of EMNLP 2024)☆103Oct 14, 2024Updated last year
- ☆38Feb 26, 2024Updated 2 years ago
- Public code repo for EMNLP 2024 Findings paper "MACAROON: Training Vision-Language Models To Be Your Engaged Partners"☆14Sep 28, 2024Updated last year
- Environments, tools, and benchmarks for general computer agents☆15Dec 3, 2024Updated last year
- Tools for working with the S800 corpus☆12Sep 17, 2020Updated 5 years ago
- Deploy open-source AI quickly and easily - Special Bonus Offer • AdRunpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
- Codebase of ACL 2023 Findings "Aerial Vision-and-Dialog Navigation"☆67May 12, 2026Updated 2 weeks ago
- ☆22Sep 20, 2022Updated 3 years ago
- Benchmarking Mobile Device Control Agents across Diverse Configurations (ICLR 2024 workshop GenAI4DM spotlight presentation; CoLLAs 2025)☆35Jul 21, 2025Updated 10 months ago
- Implementation of KDR-Agent, the AAAI 2025 accepted paper, focusing on knowledge-driven reasoning for autonomous agents.☆20Nov 24, 2025Updated 6 months ago
- ScreenExplorer: Training a Vision-Language Model for Diverse Exploration in Open GUI World☆25Jun 17, 2025Updated 11 months ago
- Implementation of LayoutGAN https://arxiv.org/abs/1901.06767☆17May 12, 2019Updated 7 years ago
- SotA text-only image/video method (IJCAI 2023)☆15Jan 9, 2024Updated 2 years ago
- Official codebase for the CVPR 2026 paper "Self-Evolving 3D Scene Generation from a Single Image"☆20Dec 15, 2025Updated 5 months ago
- ☆11Sep 18, 2017Updated 8 years ago
- End-to-end encrypted email - Proton Mail • AdSpecial offer: 40% Off Yearly / 80% Off First Month. All Proton services are open source and independently audited for security.
- Code and utilities for creating a Vision-and-Language Navigation (VLN) simulator environment from a physical space.☆12Nov 10, 2020Updated 5 years ago
- ☆46Mar 19, 2024Updated 2 years ago
- [CVPRW'23 Best Paper Award] Zero-shot Unsupervised Transfer Instance Segmentation☆24Aug 22, 2023Updated 2 years ago
- [NeurIPS 2024] Calibrated Self-Rewarding Vision Language Models☆87Oct 26, 2025Updated 7 months ago
- The model, data and code for the visual GUI Agent SeeClick☆482Jul 13, 2025Updated 10 months ago
- Video-Language Continual Learning Benchmark☆20Oct 30, 2024Updated last year
- ☆21Apr 2, 2024Updated 2 years ago