google-deepmind/pix2act

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/google-deepmind/pix2act)

google-deepmind / pix2act

☆60

Alternatives and similar repositories for pix2act

Users that are interested in pix2act are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

CogNLP / CogAGENT
View on GitHub
☆35Mar 24, 2023Updated 3 years ago
ltzheng / Synapse
View on GitHub
[ICLR 2024] Trajectory-as-Exemplar Prompting with Memory for Computer Control
☆69Jan 7, 2026Updated 6 months ago
posgnu / rci-agent
View on GitHub
A codebase for "Language Models can Solve Computer Tasks"
☆240May 1, 2024Updated 2 years ago
mklissa / phi_gcn
View on GitHub
Reward Propagation using Graph Convolutional Networks
☆13Jun 19, 2021Updated 5 years ago
chuyg1005 / seeclick-crawler
View on GitHub
☆20Apr 24, 2024Updated 2 years ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
njucckevin / SeeClick
View on GitHub
The model, data and code for the visual GUI Agent SeeClick
☆492Jul 13, 2025Updated last year
Farama-Foundation / MiniWoB-plusplus
View on GitHub
A collection of reinforcement learning environments for simple web interaction tasks
☆393Updated this week
OSU-NLP-Group / SeeActChromeExtension
View on GitHub
☆18Jan 3, 2025Updated last year
OSU-NLP-Group / EIA_against_webagent
View on GitHub
☆40Oct 2, 2024Updated last year
xbmxb / EnvDistraction
View on GitHub
☆24Oct 11, 2024Updated last year
McGill-NLP / weblinx
View on GitHub
WebLINX is a benchmark for building web navigation agents with conversational capabilities
☆162Feb 11, 2025Updated last year
X-LANCE / Mobile-Env
View on GitHub
A Universal Platform for Training and Evaluation of Mobile Interaction
☆63Sep 24, 2025Updated 9 months ago
kq-chen / qwen-vl-utils
View on GitHub
helper functions for processing and integrating visual language information with Qwen-VL Series Model
☆17Aug 30, 2024Updated last year
mklissa / dceo
View on GitHub
Learning diverse options through the Laplacian representation.
☆23Jan 5, 2024Updated 2 years ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
upiterbarg / hihack
View on GitHub
[NeurIPS 2023] Official code release accompanying the paper "NetHack is Hard to Hack" (Piterbarg, Pinto, Fergus)
☆13Oct 30, 2023Updated 2 years ago
rllabmcgill / rllabmcgill.github.io
View on GitHub
Production build of the new website
☆13May 19, 2024Updated 2 years ago
dki-lab / ArcaneQA
View on GitHub
☆23Aug 14, 2023Updated 2 years ago
nacloos / baba-is-ai
View on GitHub
Code for "Baba Is AI: Break the Rules to Beat the Benchmark"
☆49Sep 3, 2025Updated 10 months ago
OthersideAI / vllm
View on GitHub
A high-throughput and memory-efficient inference and serving engine for LLMs
☆12Nov 27, 2023Updated 2 years ago
language-agent-tutorial / language-agent-tutorial.github.io
View on GitHub
[EMNLP 2024 Tutorial] Language Agents: Foundations, Prospects, and Risks
☆10Nov 27, 2024Updated last year
THUDM / VisualAgentBench
View on GitHub
Towards Large Multimodal Models as Visual Foundation Agents
☆272Apr 24, 2025Updated last year
aniketmaurya / Agents
View on GitHub
Build Agentic workflows with function calling using open LLMs
☆27Jul 6, 2026Updated 2 weeks ago
OSU-NLP-Group / AutoSDT
View on GitHub
[EMNLP'25] AutoSDT is a fully automatic pipeline to collect data-driven scientific coding tasks to train co-scientist models.
☆21Aug 11, 2025Updated 11 months ago
AI Agents on DigitalOcean Gradient AI Platform • Ad
Build production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
UCSB-AI / Screen-Point-and-Read
View on GitHub
Code repo for "Read Anywhere Pointed: Layout-aware GUI Screen Reading with Tree-of-Lens Grounding"
☆31May 12, 2026Updated 2 months ago
OSU-NLP-Group / WebDreamer
View on GitHub
[TMLR'25] "Is Your LLM Secretly a World Model of the Internet? Model-Based Planning for Web Agents"
☆104Oct 5, 2025Updated 9 months ago
IMNearth / CoAT
View on GitHub
Official implementation for "Android in the Zoo: Chain-of-Action-Thought for GUI Agents" (Findings of EMNLP 2024)
☆103Oct 14, 2024Updated last year
McGill-NLP / feedbackqa
View on GitHub
FeedbackQA: Improving Question Answering Post-Deployment with Interactive Feedback
☆12Jul 13, 2022Updated 4 years ago
dki-lab / few-shot-bioIE
View on GitHub
True Few-Shot BioIE: Benchmarking GPT-3 In-Context and Small PLM Fine-Tuning
☆12Jul 6, 2022Updated 4 years ago
yuzhu-cai / rSDE-Bench
View on GitHub
☆36May 29, 2025Updated last year
zhaohengyuan1 / SCT
View on GitHub
(IJCV 2023) Offical implementation of "SCT: A Simple Baseline for Parameter-Efficient Fine-Tuning via Salient Channels"
☆13Mar 20, 2025Updated last year
distillpub / post--aia
View on GitHub
Using Artificial Intelligence to Augment Human Intelligence
☆19May 22, 2018Updated 8 years ago
aarmea / readability-scrape
View on GitHub
Retrieve simplified versions of webpages, powered by Mozilla's Readability.js
☆15Oct 14, 2018Updated 7 years ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
OSU-NLP-Group / SELM
View on GitHub
Symmetric Encryption with Language Models
☆13Jun 13, 2023Updated 3 years ago
zjunlp / AutoAct
View on GitHub
[ACL 2024] AutoAct: Automatic Agent Learning from Scratch for QA via Self-Planning
☆238Jan 13, 2025Updated last year
virtual-puppet-project / real-time-lip-sync-gd
View on GitHub
☆32Dec 7, 2023Updated 2 years ago
microsoft / text-to-sql-schema-expansion-generalization
View on GitHub
Bridging the Generalization Gap in Text-to-SQL Parsing with Schema Expansion
☆13Jul 26, 2023Updated 2 years ago
sdpmas / Scotch
View on GitHub
In-IDE Code Search
☆29Apr 29, 2022Updated 4 years ago
DigiRL-agent / digirl
View on GitHub
Official repo for paper DigiRL: Training In-The-Wild Device-Control Agents with Autonomous Reinforcement Learning.
☆393Feb 22, 2025Updated last year
sunlab-osu / IterPrompt
View on GitHub
☆19Nov 7, 2022Updated 3 years ago