chuyg1005 / seeclick-crawler
☆15Updated 9 months ago
Alternatives and similar repositories for seeclick-crawler:
Users that are interested in seeclick-crawler are comparing it to the libraries listed below
- ☆28Updated 4 months ago
- GUICourse: From General Vision Langauge Models to Versatile GUI Agents☆100Updated 7 months ago
- [ICLR 2024] Trajectory-as-Exemplar Prompting with Memory for Computer Control☆55Updated last month
- [NeurIPS2024] OlympicArena: Benchmarking Multi-discipline Cognitive Reasoning for Superintelligent AI☆92Updated 2 months ago
- Improving Language Understanding from Screenshots. Paper: https://arxiv.org/abs/2402.14073☆26Updated 7 months ago
- Evaluation framework for paper "VisualWebBench: How Far Have Multimodal LLMs Evolved in Web Page Understanding and Grounding?"☆49Updated 4 months ago
- (ICLR 2025) The Official Code Repository for GUI-World.☆47Updated 2 months ago
- [ICLR'24 spotlight] Tool-Augmented Reward Modeling☆44Updated last month
- A Universal Platform for Training and Evaluation of Mobile Interaction☆41Updated 2 months ago
- Code for ICLR 2024 paper "CRAFT: Customizing LLMs by Creating and Retrieving from Specialized Toolsets"☆51Updated 8 months ago
- This repository contains the code and data for the paper "VisOnlyQA: Large Vision Language Models Still Struggle with Visual Perception o…☆21Updated 2 months ago
- The code and data for the paper JiuZhang3.0☆40Updated 8 months ago
- ☆58Updated 5 months ago
- A curated list of the papers, repositories, tutorials, and anythings related to the large language models for tools☆68Updated last year
- GUI Odyssey is a comprehensive dataset for training and evaluating cross-app navigation agents. GUI Odyssey consists of 7,735 episodes fr…☆89Updated 3 months ago
- PreAct: Prediction Enhances Agent's Planning Ability (Coling2025)☆25Updated 2 months ago
- Code for Paper: Harnessing Webpage Uis For Text Rich Visual Understanding☆48Updated 2 months ago
- ☆31Updated 8 months ago
- Code for the arXiv preprint "The Unreasonable Effectiveness of Easy Training Data"☆46Updated last year
- Official implementation of the paper "MMInA: Benchmarking Multihop Multimodal Internet Agents"☆41Updated last week
- Code and data for OS-Genesis: Automating GUI Agent Trajectory Construction via Reverse Task Synthesis☆103Updated 3 weeks ago
- Trial and Error: Exploration-Based Trajectory Optimization of LLM Agents (ACL 2024 Main Conference)☆116Updated 3 months ago
- official implementation of paper "Process Reward Model with Q-value Rankings"☆48Updated 2 weeks ago
- PPTC Benchmark: Evaluating Large Language Models for PowerPoint Task Completion☆49Updated 11 months ago
- Watch Every Step! LLM Agent Learning via Iterative Step-level Process Refinement (EMNLP 2024 Main Conference)☆52Updated 4 months ago
- Syntax Error-Free and Generalizable Tool Use for LLMs via Finite-State Decoding☆27Updated last year
- [ACL 2024] PCA-Bench: Evaluating Multimodal Large Language Models in Perception-Cognition-Action Chain☆103Updated 11 months ago
- [AAAI 2025 oral] Evaluating Mathematical Reasoning Beyond Accuracy☆48Updated 2 months ago
- Code for Paper: Autonomous Evaluation and Refinement of Digital Agents [COLM 2024]☆125Updated 2 months ago