Code repo for "Read Anywhere Pointed: Layout-aware GUI Screen Reading with Tree-of-Lens Grounding"
☆29Jul 31, 2024Updated last year
Alternatives and similar repositories for Screen-Point-and-Read
Users that are interested in Screen-Point-and-Read are comparing it to the libraries listed below
Sorting:
- ☆31Sep 27, 2024Updated last year
- [ICCV 2025] GUIOdyssey is a comprehensive dataset for training and evaluating cross-app navigation agents. GUIOdyssey consists of 8,834 e…☆149Jan 3, 2026Updated 2 months ago
- [WWW2024 Oral] Harnessing Multi-Role Capabilities of Large Language Models for Open-Domain Question Answering☆15Apr 22, 2025Updated 11 months ago
- ☆32May 29, 2025Updated 9 months ago
- [NeurIPS 2024] The official implementation of "Image Copy Detection for Diffusion Models"☆18Oct 1, 2024Updated last year
- Dataset for Bilingual VLN☆11Dec 5, 2020Updated 5 years ago
- CVPR25☆27Jul 2, 2025Updated 8 months ago
- [ICCV 2025 Highlight] Less is More: Empowering GUI Agent with Context-Aware Simplification☆44Mar 12, 2026Updated last week
- [ECCV 2022] Official pytorch implementation of the paper "FedVLN: Privacy-preserving Federated Vision-and-Language Navigation"☆13Oct 8, 2022Updated 3 years ago
- ORES: Open-vocabulary Responsible Visual Synthesis☆14Dec 12, 2023Updated 2 years ago
- ☆17Oct 31, 2023Updated 2 years ago
- Official repo for "Imagination-Augmented Natural Language Understanding", NAACL 2022.☆17Aug 30, 2022Updated 3 years ago
- The results and code of our IEEE TCYB 2022 paper, titled "Global-and-Local Collaborative Learning for Co-Salient Object Detection"☆13May 2, 2022Updated 3 years ago
- Official implementation for ICLR 2023 paper Consolidator: Mergeable Adapter with Grouped Connections for Visual Adaptation☆16Jan 23, 2024Updated 2 years ago
- Chinese Translation for Bartosz Milewski's 'Category Theory for Programmers'. 《写给程序员的范畴论》中文翻译 欢迎 PR☆12Oct 4, 2024Updated last year
- The official implementation of AutoGUI.☆16Oct 6, 2025Updated 5 months ago
- Dataset and models for paper "Game-Based Video-Context Dialogue (EMNLP 2018)"☆19Oct 25, 2018Updated 7 years ago
- Code for the EMNLP 2021 Oral paper "Are Gender-Neutral Queries Really Gender-Neutral? Mitigating Gender Bias in Image Search" https://arx…☆12Feb 6, 2023Updated 3 years ago
- ☆17Oct 30, 2023Updated 2 years ago
- ☆36Feb 26, 2024Updated 2 years ago
- Seq2act: Mapping Natural Language Instructions to Mobile UI Action Sequences from Google research☆15Jul 13, 2020Updated 5 years ago
- This is the repository to the article "NEWBEE: A Multi-Modal Gait Database of Natural Everyday-Walk in an Urban Environment", 2022☆11Aug 2, 2022Updated 3 years ago
- Benchmarking Mobile Device Control Agents across Diverse Configurations (ICLR 2024 workshop GenAI4DM spotlight presentation; CoLLAs 2025)☆35Jul 21, 2025Updated 8 months ago
- Generative Reranker PyTerrier☆18Dec 1, 2025Updated 3 months ago
- ScreenExplorer: Training a Vision-Language Model for Diverse Exploration in Open GUI World☆25Jun 17, 2025Updated 9 months ago
- A one-stop library to standardize the inference and evaluation of all the conditional video generation models.☆51Feb 13, 2025Updated last year
- ☆11Sep 18, 2017Updated 8 years ago
- [ACL2025 Findings] Benchmarking Multihop Multimodal Internet Agents☆52Feb 27, 2025Updated last year
- ☆20Dec 15, 2025Updated 3 months ago
- This is the official repository of the paper "Atomic-to-Compositional Generalization for Mobile Agents with A New Benchmark and Schedulin…☆13Jul 27, 2025Updated 7 months ago
- ☆20Apr 2, 2024Updated last year
- ☆45Mar 19, 2024Updated 2 years ago
- The model, data and code for the visual GUI Agent SeeClick☆472Jul 13, 2025Updated 8 months ago
- [CVPRW'23 Best Paper Award] Zero-shot Unsupervised Transfer Instance Segmentation☆24Aug 22, 2023Updated 2 years ago
- [NeurIPS 2024] Calibrated Self-Rewarding Vision Language Models☆87Oct 26, 2025Updated 4 months ago
- Video-Language Continual Learning Benchmark☆20Oct 30, 2024Updated last year
- This is the public repository of AAAI 2024 paper "Is a Large Language Model a Good Annotator for Event Extraction"☆10Feb 16, 2024Updated 2 years ago
- This repository contains code for the paper RMM: A Recursive Mental Model for Dialog Navigation☆10Nov 22, 2022Updated 3 years ago
- ☆14Jan 8, 2025Updated last year