☆30Apr 16, 2024Updated 2 years ago
Alternatives and similar repositories for assistgui
Users that are interested in assistgui are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Repository of GUI Action Narrator☆13Apr 8, 2025Updated last year
- Enable AI to control your PC. This repo includes the WorldGUI Benchmark and GUI-Thinker Agent Framework.☆116Jul 27, 2025Updated 8 months ago
- Official implementation of WebVLN: Vision-and-Language Navigation on Websites☆35Jan 2, 2024Updated 2 years ago
- The model, data and code for the visual GUI Agent SeeClick☆478Jul 13, 2025Updated 9 months ago
- [NeurIPS 2024 D&B] VideoGUI: A Benchmark for GUI Automation from Instructional Videos☆52Feb 22, 2026Updated last month
- Virtual machines for every use case on DigitalOcean • AdGet dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
- [CVPR 2026] Official repo for "VideoSSR: Video Self-Supervised Reinforcement Learning"☆35Nov 11, 2025Updated 5 months ago
- ☆20Apr 24, 2024Updated last year
- ☆15Nov 3, 2022Updated 3 years ago
- ☆32Sep 27, 2024Updated last year
- Implementation of the paper: Going Full-TILT Boogie on Document Understanding with Text-Image-Layout Transformer.☆18Apr 23, 2023Updated 2 years ago
- 💻 A curated list of papers and resources for multi-modal Graphical User Interface (GUI) agents.☆1,163Aug 17, 2025Updated 8 months ago
- Offical Code of MICCAI'25 Best-Paper-Shortlist paper "MedGround-R1: Advancing Medical Image Grounding via Spatial-Semantic Rewarded Group…☆38Sep 28, 2025Updated 6 months ago
- GUICourse: From General Vision Langauge Models to Versatile GUI Agents☆141Mar 1, 2026Updated last month
- The dataset includes screen summaries that describes Android app screenshot's functionalities. It is used for training and evaluation of …☆67Jul 27, 2021Updated 4 years ago
- Wordpress hosting with auto-scaling - Free Trial • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- This is the official repository of the paper "Atomic-to-Compositional Generalization for Mobile Agents with A New Benchmark and Schedulin…☆13Jul 27, 2025Updated 8 months ago
- [ICLR 2025] A trinity of environments, tools, and benchmarks for general virtual agents☆231Jun 16, 2025Updated 10 months ago
- [ICLR 2026] SparseD: Sparse Attention for Diffusion Language Models☆63Feb 22, 2026Updated last month
- [CVPR 2024 CVinW] Multi-Agent VQA: Exploring Multi-Agent Foundation Models on Zero-Shot Visual Question Answering☆20Sep 21, 2024Updated last year
- Code for "CNN^2: Viewpoint Generalization via a Binocular Vision" (NeurIPS 2019)☆11Aug 7, 2021Updated 4 years ago
- ☆18Nov 1, 2024Updated last year
- ☆12Aug 8, 2024Updated last year
- Official implementation of "MedITok: A Unified Tokenizer for Medical Image Synthesis and Interpretation"☆28Apr 3, 2026Updated 2 weeks ago
- [WSDM 2024] Hierarchical Multimodal Pre-training for Visually Rich Webpage Understanding☆18Mar 6, 2024Updated 2 years ago
- GPUs on demand by Runpod - Special Offer Available • AdRun AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
- Mobile App Tasks with Iterative Feedback (MoTIF): Addressing Task Feasibility in Interactive Visual Environments☆61Aug 19, 2024Updated last year
- A curated list of cutting-edge research papers and resources on Long Chain-of-Thought (CoT) Reasoning with Tools.☆47Dec 17, 2025Updated 4 months ago
- This project explores the different techniques (both scalable and non scalable) for Graph based semi supervised learning. Recent techniqu…☆14May 28, 2016Updated 9 years ago
- This repository contains the code of our paper 'Skip \n: A simple method to reduce hallucination in Large Vision-Language Models'.☆15Feb 12, 2024Updated 2 years ago
- Codebase for Hyperdecoders https://arxiv.org/abs/2203.08304☆14Oct 11, 2022Updated 3 years ago
- ☆21Jun 14, 2024Updated last year
- Speed up Python's shutil.copyfile by using sendfile system call☆11Aug 2, 2018Updated 7 years ago
- code for Eliminating Cross-modal Conflicts in BEV Space for LiDAR-Camera 3D Object Detection☆19Mar 4, 2024Updated 2 years ago
- Dialog2Flow: convert your dialogs to flows. This repository accompanies the paper "Dialog2Flow: Pre-training Soft-Contrastive Sentence Em…☆18Jul 1, 2025Updated 9 months ago
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- ☆15Nov 23, 2023Updated 2 years ago
- Generate images from an initial frame and text☆37Jul 28, 2023Updated 2 years ago
- A back-tester for testing stock trading strategies on historical data☆19Feb 18, 2026Updated 2 months ago
- Official implementation for "You Only Look at Screens: Multimodal Chain-of-Action Agents" (Findings of ACL 2024)☆256Jul 16, 2024Updated last year
- Official codebase for CuGRO: Continual Offline Reinforcement Learning via Diffusion-based Dual Generative Replay☆33Apr 14, 2024Updated 2 years ago
- [CVPR 2025] Open-source, End-to-end, Vision-Language-Action model for GUI Agent & Computer Use.☆1,774Jan 20, 2026Updated 2 months ago
- LlamaTouch: A Faithful and Scalable Testbed for Mobile UI Task Automation☆67Aug 9, 2024Updated last year