☆59Jan 9, 2024Updated 2 years ago
Alternatives and similar repositories for pix2act
Users that are interested in pix2act are comparing it to the libraries listed below
Sorting:
- ☆20Apr 24, 2024Updated last year
- ☆35Mar 24, 2023Updated 2 years ago
- T5Patches is a set of tools for fast and targeted editing of generative language models built with T5X.☆12May 31, 2024Updated last year
- The model, data and code for the visual GUI Agent SeeClick☆467Jul 13, 2025Updated 7 months ago
- VisualWebArena is a benchmark for multimodal agents.☆440Nov 9, 2024Updated last year
- A codebase for "Language Models can Solve Computer Tasks"☆240May 1, 2024Updated last year
- A Universal Platform for Training and Evaluation of Mobile Interaction☆60Sep 24, 2025Updated 5 months ago
- helper functions for processing and integrating visual language information with Qwen-VL Series Model☆17Aug 30, 2024Updated last year
- MiniWoB++: a web interaction benchmark for reinforcement learning☆371May 5, 2025Updated 9 months ago
- Causal Analysis of Agent Behavior for AI Safety☆20Jun 27, 2023Updated 2 years ago
- ☆18Jan 3, 2025Updated last year
- Computer-Use Agents as Judges for Generative UI☆43Nov 27, 2025Updated 3 months ago
- Practical examples of machine learning use cases in manufacturing☆18Jan 13, 2025Updated last year
- WebLINX is a benchmark for building web navigation agents with conversational capabilities☆160Feb 11, 2025Updated last year
- ☆18May 14, 2024Updated last year
- Real-time visualisation☆22Oct 2, 2025Updated 5 months ago
- Easily benchmark Machine Learning models on selected tasks and datasets☆16May 22, 2023Updated 2 years ago
- Hugging Face and Pyserini interoperability☆19May 18, 2023Updated 2 years ago
- A minimal PyTorch re-implementation of the OpenAI GPT (Generative Pretrained Transformer) training☆19Apr 26, 2022Updated 3 years ago
- ☆56Apr 28, 2025Updated 10 months ago
- [ACL2025 Findings] Benchmarking Multihop Multimodal Internet Agents☆48Feb 27, 2025Updated last year
- Buildbot infrastructure☆23Jul 31, 2025Updated 7 months ago
- Official implementation for the paper "Understanding Hyperdimensional Computing for Parallel Single-Pass Learning"☆23Jun 10, 2023Updated 2 years ago
- [ICML'24] SeeAct is a system for generalist web agents that autonomously carry out tasks on any given website, with a focus on large mult…☆826Feb 3, 2025Updated last year
- ☆23Jul 10, 2023Updated 2 years ago
- Automatic Integration for Neural Spatio-Temporal Point Process models (AI-STPP) is a new paradigm for exact, efficient, non-parametric inf…☆25Oct 14, 2024Updated last year
- ☆23Oct 11, 2024Updated last year
- Code repo for "Read Anywhere Pointed: Layout-aware GUI Screen Reading with Tree-of-Lens Grounding"☆28Jul 31, 2024Updated last year
- Towards Large Multimodal Models as Visual Foundation Agents☆256Apr 24, 2025Updated 10 months ago
- [CVPR 2024] Code for "Improved Visual Grounding through Self-Consistent Explanations".☆27Mar 1, 2024Updated 2 years ago
- https://pypi.org/project/intent-suggestions/☆10Sep 6, 2022Updated 3 years ago
- quick playground to animate pippin☆14Nov 11, 2024Updated last year
- This tool allows local LLM usage that can automate tasks without human interventention. The agent can call itself recursively and work on…☆20May 5, 2025Updated 9 months ago
- Official implementation for "You Only Look at Screens: Multimodal Chain-of-Action Agents" (Findings of ACL 2024)☆255Jul 16, 2024Updated last year
- ☆32Dec 7, 2023Updated 2 years ago
- GUICourse: From General Vision Langauge Models to Versatile GUI Agents☆136Updated this week
- Codebase for fine-tuning Llama2 70B to generate math test questions and answers.☆11Aug 30, 2024Updated last year
- A MATLAB app to interactively navigate Ryze Tello drone, read navigation data, process image data and produce equivalent MATLAB code. Thi…☆13Oct 22, 2025Updated 4 months ago
- Chrome extension that restores the Dim (dark blue) background theme on X/Twitter☆36Feb 19, 2026Updated last week