[NeurIPS 2025] A multimodal agent that can interact with its own PC in a multimodal manner.
☆36Feb 25, 2026Updated last month
Alternatives and similar repositories for InfantAgent
Users that are interested in InfantAgent are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- OSWorld-Human: Benchmarking the Efficiency of Computer-Use Agents☆24Jan 6, 2026Updated 3 months ago
- The raw UserRL repo under construction☆97Sep 25, 2025Updated 6 months ago
- ☆20Apr 3, 2025Updated last year
- ☆32Jul 3, 2025Updated 9 months ago
- The first spoken long-text dataset derived from live streams, designed to reflect the redundancy-rich and conversational nature of real-w…☆12Jun 28, 2025Updated 9 months ago
- Open source password manager - Proton Pass • AdSecurely store, share, and autofill your credentials with Proton Pass, the end-to-end encrypted password manager trusted by millions.
- [ICLR 2026] Thinking on the Fly: Test-Time Reasoning Enhancement via Latent Thought Policy Optimization☆24Mar 6, 2026Updated last month
- Website for HKU NLP group (under construction)☆14Mar 20, 2026Updated 3 weeks ago
- ☆18Mar 2, 2026Updated last month
- ☆129Oct 3, 2025Updated 6 months ago
- Official Implementation of ARPO: End-to-End Policy Optimization for GUI Agents with Experience Replay☆155May 29, 2025Updated 10 months ago
- An official repository for GPTailor☆17Jun 29, 2025Updated 9 months ago
- ☆33May 9, 2025Updated 11 months ago
- Benchmark of complex, multimodal desktop-oriented tasks for advanced GUI-navigation AI agents☆24May 7, 2025Updated 11 months ago
- Official Repo of Your Agent May Misevolve: Emergent Risks in Self-evolving LLM Agents☆71Oct 28, 2025Updated 5 months ago
- Proton VPN Special Offer - Get 70% off • AdSpecial partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
- ☆41Aug 20, 2025Updated 7 months ago
- [ACL 2025 Findings] Text2World: Benchmarking Large Language Models for Symbolic World Model Generation☆28Feb 25, 2025Updated last year
- Code & data for "RoboGround: Robotic Manipulation with Grounded Vision-Language Priors" (CVPR 2025)☆43May 25, 2025Updated 10 months ago
- ☆14Mar 11, 2025Updated last year
- [NAACL'25] "Revealing the Barriers of Language Agents in Planning"☆13Jun 22, 2025Updated 9 months ago
- ☆13Jan 19, 2026Updated 2 months ago
- Advances and Frontiers of LLM-based Issue Resolution in Software Engineering A Comprehensive Survey☆75Apr 1, 2026Updated last week
- ☆22May 3, 2025Updated 11 months ago
- [ICCV 2025] Improving 3D Large Language Model via Robust Instruction Tuning☆70Oct 19, 2025Updated 5 months ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click and start building anything your business needs.
- implementation of https://arxiv.org/pdf/2312.09299☆21Jul 3, 2024Updated last year
- [ACL 2026 Findings] Official code repo for the paper "LearnAct: Few-Shot Mobile GUI Agent with a Unified Demonstration Benchmark"☆46Updated this week
- This is an official repository for Practical Membership Inference Attacks Against Large-Scale Multi-Modal Models: A Pilot Study (ICCV2023…☆24Sep 29, 2023Updated 2 years ago
- This repository contains the code and data for the paper "VisOnlyQA: Large Vision Language Models Still Struggle with Visual Perception o…☆29Jul 9, 2025Updated 9 months ago
- some tutorials for blog: simonjisu.github.io☆23Mar 25, 2021Updated 5 years ago
- A Lightweight Visual Reasoning Benchmark for Evaluating Large Multimodal Models through Complex Diagrams in Coding Tasks☆15Feb 25, 2025Updated last year
- Container-free RL framework for training software engineering agents☆50Mar 4, 2026Updated last month
- Code repo for the paper: Attacking Vision-Language Computer Agents via Pop-ups☆51Dec 23, 2024Updated last year
- ☆15Mar 2, 2025Updated last year
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click and start building anything your business needs.
- G1: Bootstrapping Perception and Reasoning Abilities of Vision-Language Model via Reinforcement Learning☆102May 20, 2025Updated 10 months ago
- Now you can date a Zoom meeting with AI's help.☆14Jun 22, 2025Updated 9 months ago
- ☆55Jun 4, 2025Updated 10 months ago
- Deduplication over dis-aggregated memory for Serverless Computing☆14Mar 21, 2022Updated 4 years ago
- Reasoning Agentic Retrieval-Augmented Generation for Industry Challenges☆28May 14, 2025Updated 10 months ago
- ☆20Apr 24, 2024Updated last year
- [ICLR 2026] JanusCoder: Towards a Foundational Visual-Programmatic Interface for Code Intelligence☆78Feb 9, 2026Updated 2 months ago