Code repo for "Read Anywhere Pointed: Layout-aware GUI Screen Reading with Tree-of-Lens Grounding"
☆30May 12, 2026Updated last week
Alternatives and similar repositories for Screen-Point-and-Read
Users that are interested in Screen-Point-and-Read are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- ☆32Sep 27, 2024Updated last year
- ☆23Oct 11, 2024Updated last year
- [ICCV 2025] GUIOdyssey is a comprehensive dataset for training and evaluating cross-app navigation agents. GUIOdyssey consists of 8,834 e…☆156Jan 3, 2026Updated 4 months ago
- The official repository of "SmartAgent: Chain-of-User-Thought for Embodied Personalized Agent in Cyber World".☆27Aug 20, 2025Updated 9 months ago
- [AAAI2025 Oral] BiDeV: Bilateral Defusing Verification for Complex Claim Fact-Checking☆15Apr 22, 2025Updated last year
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- ☆35May 29, 2025Updated 11 months ago
- [NeurIPS 2024] The official implementation of "Image Copy Detection for Diffusion Models"☆18Oct 1, 2024Updated last year
- The dataset includes widget captions that describes UI element's functionalities. It is used for training and evaluation of the widget ca…☆23Jun 24, 2021Updated 4 years ago
- VisionDroid☆22Apr 2, 2024Updated 2 years ago
- Dataset for Bilingual VLN☆11Dec 5, 2020Updated 5 years ago
- CVPR25☆27Jul 2, 2025Updated 10 months ago
- [ECCV 2022] Official pytorch implementation of the paper "FedVLN: Privacy-preserving Federated Vision-and-Language Navigation"☆13Oct 8, 2022Updated 3 years ago
- ORES: Open-vocabulary Responsible Visual Synthesis☆14Dec 12, 2023Updated 2 years ago
- ☆17Oct 31, 2023Updated 2 years ago
- Serverless GPU API endpoints on Runpod - Get Bonus Credits • AdSkip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
- Official repo for "Imagination-Augmented Natural Language Understanding", NAACL 2022.☆17Aug 30, 2022Updated 3 years ago
- The results and code of our IEEE TCYB 2022 paper, titled "Global-and-Local Collaborative Learning for Co-Salient Object Detection"☆13May 2, 2022Updated 4 years ago
- Official implementation for ICLR 2023 paper Consolidator: Mergeable Adapter with Grouped Connections for Visual Adaptation☆16Jan 23, 2024Updated 2 years ago
- ☆13Feb 24, 2025Updated last year
- GUIEvalKit: Open-source Evaluation Toolkit for GUI Agents☆21Feb 26, 2026Updated 2 months ago
- ☆25May 12, 2026Updated last week
- Dataset and models for paper "Game-Based Video-Context Dialogue (EMNLP 2018)"☆19Oct 25, 2018Updated 7 years ago
- Code for the EMNLP 2021 Oral paper "Are Gender-Neutral Queries Really Gender-Neutral? Mitigating Gender Bias in Image Search" https://arx…☆12Feb 6, 2023Updated 3 years ago
- Implementation of "Multimodal Text Style Transfer for Outdoor Vision-and-Language Navigation"☆27Mar 4, 2021Updated 5 years ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- Implementation of KDR-Agent, the AAAI 2025 accepted paper, focusing on knowledge-driven reasoning for autonomous agents.☆18Nov 24, 2025Updated 5 months ago
- ☆17Oct 30, 2023Updated 2 years ago
- Official implementation for "Android in the Zoo: Chain-of-Action-Thought for GUI Agents" (Findings of EMNLP 2024)☆102Oct 14, 2024Updated last year
- ☆37Feb 26, 2024Updated 2 years ago
- Public code repo for EMNLP 2024 Findings paper "MACAROON: Training Vision-Language Models To Be Your Engaged Partners"☆14Sep 28, 2024Updated last year
- ☆22Sep 20, 2022Updated 3 years ago
- Generative Reranker PyTerrier☆18Dec 1, 2025Updated 5 months ago
- ScreenExplorer: Training a Vision-Language Model for Diverse Exploration in Open GUI World☆25Jun 17, 2025Updated 11 months ago
- Official Pytorch codebase for Open-Vocabulary Instance Segmentation without Manual Mask Annotations [CVPR 2023]☆53Oct 26, 2025Updated 6 months ago
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- Implementation of LayoutGAN https://arxiv.org/abs/1901.06767☆17May 12, 2019Updated 7 years ago
- SotA text-only image/video method (IJCAI 2023)☆15Jan 9, 2024Updated 2 years ago
- Official codebase for the CVPR 2026 paper "Self-Evolving 3D Scene Generation from a Single Image"☆20Dec 15, 2025Updated 5 months ago
- Code and utilities for creating a Vision-and-Language Navigation (VLN) simulator environment from a physical space.☆12Nov 10, 2020Updated 5 years ago
- [ACL2025 Findings] Benchmarking Multihop Multimodal Internet Agents☆54Feb 27, 2025Updated last year
- [CVPRW'23 Best Paper Award] Zero-shot Unsupervised Transfer Instance Segmentation☆24Aug 22, 2023Updated 2 years ago
- The model, data and code for the visual GUI Agent SeeClick☆480Jul 13, 2025Updated 10 months ago