Code repo for "Read Anywhere Pointed: Layout-aware GUI Screen Reading with Tree-of-Lens Grounding"
☆31May 12, 2026Updated last month
Alternatives and similar repositories for Screen-Point-and-Read
Users that are interested in Screen-Point-and-Read are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- ☆33Sep 27, 2024Updated last year
- ☆23Oct 11, 2024Updated last year
- The official repository of "SmartAgent: Chain-of-User-Thought for Embodied Personalized Agent in Cyber World".☆27Aug 20, 2025Updated 9 months ago
- [AAAI2025 Oral] BiDeV: Bilateral Defusing Verification for Complex Claim Fact-Checking☆15Apr 22, 2025Updated last year
- [WWW2024 Oral] Harnessing Multi-Role Capabilities of Large Language Models for Open-Domain Question Answering☆15Apr 22, 2025Updated last year
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- ☆36May 29, 2025Updated last year
- [NeurIPS 2024] The official implementation of "Image Copy Detection for Diffusion Models"☆18Oct 1, 2024Updated last year
- The dataset includes widget captions that describes UI element's functionalities. It is used for training and evaluation of the widget ca…☆23Jun 24, 2021Updated 4 years ago
- Dataset for Bilingual VLN☆11Dec 5, 2020Updated 5 years ago
- CVPR25☆28Jul 2, 2025Updated 11 months ago
- [ICCV 2025 Highlight] Less is More: Empowering GUI Agent with Context-Aware Simplification☆49Mar 12, 2026Updated 3 months ago
- [ECCV 2022] Official pytorch implementation of the paper "FedVLN: Privacy-preserving Federated Vision-and-Language Navigation"☆13Oct 8, 2022Updated 3 years ago
- ☆17Oct 31, 2023Updated 2 years ago
- The dataset includes screen summaries that describes Android app screenshot's functionalities. It is used for training and evaluation of …☆67Jul 27, 2021Updated 4 years ago
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- A hot-pluggable tool for visualizing LLaVA's attention.☆24Jan 29, 2024Updated 2 years ago
- ☆25May 12, 2026Updated last month
- Dataset and models for paper "Game-Based Video-Context Dialogue (EMNLP 2018)"☆19Oct 25, 2018Updated 7 years ago
- Code for the EMNLP 2021 Oral paper "Are Gender-Neutral Queries Really Gender-Neutral? Mitigating Gender Bias in Image Search" https://arx…☆12Feb 6, 2023Updated 3 years ago
- Implementation of "Multimodal Text Style Transfer for Outdoor Vision-and-Language Navigation"☆27Mar 4, 2021Updated 5 years ago
- Official implementation for "Android in the Zoo: Chain-of-Action-Thought for GUI Agents" (Findings of EMNLP 2024)☆103Oct 14, 2024Updated last year
- ☆10Sep 23, 2024Updated last year
- Public code repo for EMNLP 2024 Findings paper "MACAROON: Training Vision-Language Models To Be Your Engaged Partners"☆14Sep 28, 2024Updated last year
- Environments, tools, and benchmarks for general computer agents☆16Dec 3, 2024Updated last year
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- Seq2act: Mapping Natural Language Instructions to Mobile UI Action Sequences from Google research☆15Jul 13, 2020Updated 5 years ago
- Tools for working with the S800 corpus☆12Sep 17, 2020Updated 5 years ago
- Codebase of ACL 2023 Findings "Aerial Vision-and-Dialog Navigation"☆69May 12, 2026Updated last month
- Benchmarking Mobile Device Control Agents across Diverse Configurations (ICLR 2024 workshop GenAI4DM spotlight presentation; CoLLAs 2025)☆35Jul 21, 2025Updated 10 months ago
- Implementation of KDR-Agent, the AAAI 2025 accepted paper, focusing on knowledge-driven reasoning for autonomous agents.☆21Nov 24, 2025Updated 6 months ago
- Data Release for VALUE Benchmark☆30Feb 16, 2022Updated 4 years ago
- Official Pytorch codebase for Open-Vocabulary Instance Segmentation without Manual Mask Annotations [CVPR 2023]☆53Oct 26, 2025Updated 7 months ago
- ScreenExplorer: Training a Vision-Language Model for Diverse Exploration in Open GUI World☆26Jun 17, 2025Updated last year
- Implementation of LayoutGAN https://arxiv.org/abs/1901.06767☆17May 12, 2019Updated 7 years ago
- Deploy open-source AI quickly and easily - Special Bonus Offer • AdRunpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
- SotA text-only image/video method (IJCAI 2023)☆15Jan 9, 2024Updated 2 years ago
- Official codebase for the CVPR 2026 paper "Self-Evolving 3D Scene Generation from a Single Image"☆20Dec 15, 2025Updated 6 months ago
- ☆11Sep 18, 2017Updated 8 years ago
- Code and utilities for creating a Vision-and-Language Navigation (VLN) simulator environment from a physical space.☆12Nov 10, 2020Updated 5 years ago
- This is the official repository of the paper "Atomic-to-Compositional Generalization for Mobile Agents with A New Benchmark and Schedulin…☆14Jul 27, 2025Updated 10 months ago
- [CVPRW'23 Best Paper Award] Zero-shot Unsupervised Transfer Instance Segmentation☆24Aug 22, 2023Updated 2 years ago
- [NeurIPS 2024] Calibrated Self-Rewarding Vision Language Models☆87Oct 26, 2025Updated 7 months ago