[NeurIPS 2025] A multimodal agent that can interact with its own PC in a multimodal manner.
☆37Apr 23, 2026Updated last week
Alternatives and similar repositories for InfantAgent
Users that are interested in InfantAgent are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- The raw UserRL repo under construction☆99Sep 25, 2025Updated 7 months ago
- ☆32Jul 3, 2025Updated 9 months ago
- The first spoken long-text dataset derived from live streams, designed to reflect the redundancy-rich and conversational nature of real-w…☆12Jun 28, 2025Updated 10 months ago
- [ICLR 2026] Thinking on the Fly: Test-Time Reasoning Enhancement via Latent Thought Policy Optimization☆26Mar 6, 2026Updated last month
- Website for HKU NLP group (under construction)☆14Mar 20, 2026Updated last month
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- ☆130Oct 3, 2025Updated 6 months ago
- Official Implementation of ARPO: End-to-End Policy Optimization for GUI Agents with Experience Replay☆158May 29, 2025Updated 11 months ago
- An official repository for GPTailor☆17Jun 29, 2025Updated 10 months ago
- ☆33May 9, 2025Updated 11 months ago
- Benchmark of complex, multimodal desktop-oriented tasks for advanced GUI-navigation AI agents☆24May 7, 2025Updated 11 months ago
- [ACL 2025 Findings] Text2World: Benchmarking Large Language Models for Symbolic World Model Generation☆29Feb 25, 2025Updated last year
- [ACL'25 (Findings)] Explorer: Scaling Exploration-driven Web Trajectory Synthesis for Multimodal Web Agents☆28Feb 17, 2026Updated 2 months ago
- ☆14Mar 11, 2025Updated last year
- [NAACL'25] "Revealing the Barriers of Language Agents in Planning"☆13Jun 22, 2025Updated 10 months ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- ☆32Jan 28, 2026Updated 3 months ago
- Utility to use eleven lab's streaming to in the command line☆11Aug 8, 2023Updated 2 years ago
- ☆22May 3, 2025Updated 11 months ago
- [ACL 2026 Findings] Official code repo for the paper "LearnAct: Few-Shot Mobile GUI Agent with a Unified Demonstration Benchmark"☆47Apr 18, 2026Updated last week
- The paper list of multilingual pre-trained models (Continual Updated).☆24Jun 18, 2024Updated last year
- implementation of https://arxiv.org/pdf/2312.09299☆21Jul 3, 2024Updated last year
- This repository contains the code and data for the paper "VisOnlyQA: Large Vision Language Models Still Struggle with Visual Perception o…☆29Jul 9, 2025Updated 9 months ago
- some tutorials for blog: simonjisu.github.io☆23Mar 25, 2021Updated 5 years ago
- A Lightweight Visual Reasoning Benchmark for Evaluating Large Multimodal Models through Complex Diagrams in Coding Tasks☆15Feb 25, 2025Updated last year
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- [ICCV 2025 Highlight] Less is More: Empowering GUI Agent with Context-Aware Simplification☆47Mar 12, 2026Updated last month
- ☆15Mar 2, 2025Updated last year
- G1: Bootstrapping Perception and Reasoning Abilities of Vision-Language Model via Reinforcement Learning☆102May 20, 2025Updated 11 months ago
- Now you can date a Zoom meeting with AI's help.☆14Jun 22, 2025Updated 10 months ago
- ☆55Apr 14, 2026Updated 2 weeks ago
- ☆34Sep 19, 2025Updated 7 months ago
- ☆20Apr 24, 2024Updated 2 years ago
- Container-free RL framework for training software engineering agents☆54Mar 4, 2026Updated last month
- [ICLR 2026] JanusCoder: Towards a Foundational Visual-Programmatic Interface for Code Intelligence☆79Feb 9, 2026Updated 2 months ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- Official Repo for MageBench: Bridging Large Multimodal Models to Agents☆22Jan 8, 2025Updated last year
- ☆13Oct 19, 2023Updated 2 years ago
- Eko Browser Extension Template☆40May 21, 2025Updated 11 months ago
- Urban Generative Intelligence (UGI): A Foundational Platform for Embodied Agent and Future City☆12Dec 17, 2023Updated 2 years ago
- [ACL 2025] Research code for the paper "OS-Kairos: Adaptive Interaction for MLLM-Powered GUI Agents"☆21Jun 19, 2025Updated 10 months ago
- Code for EMNLP2023 paper "MolCA: Molecular Graph-Language Modeling with Cross-Modal Projector and Uni-Modal Adapter".☆12Dec 27, 2023Updated 2 years ago
- ☆13May 22, 2023Updated 2 years ago