[NeurIPS 2025] A multimodal agent that can interact with its own PC in a multimodal manner.
☆34Nov 10, 2025Updated 3 months ago
Alternatives and similar repositories for InfantAgent
Users that are interested in InfantAgent are comparing it to the libraries listed below
Sorting:
- OSWorld-Human: Benchmarking the Efficiency of Computer-Use Agents☆21Jan 6, 2026Updated last month
- The first spoken long-text dataset derived from live streams, designed to reflect the redundancy-rich and conversational nature of real-w…☆12Jun 28, 2025Updated 8 months ago
- ☆30Jul 3, 2025Updated 7 months ago
- [NAACL'25] "Revealing the Barriers of Language Agents in Planning"☆13Jun 22, 2025Updated 8 months ago
- ☆125Oct 3, 2025Updated 4 months ago
- An official repository for GPTailor☆17Jun 29, 2025Updated 8 months ago
- Website for HKU NLP group (under construction)☆14Dec 23, 2025Updated 2 months ago
- ☆25Jan 28, 2026Updated last month
- [ACL'25 (Findings)] Explorer: Scaling Exploration-driven Web Trajectory Synthesis for Multimodal Web Agents☆26Feb 17, 2026Updated last week
- Official Implementation of ARPO: End-to-End Policy Optimization for GUI Agents with Experience Replay☆148May 29, 2025Updated 9 months ago
- [ACL 2025 Findings] Text2World: Benchmarking Large Language Models for Symbolic World Model Generation☆28Feb 25, 2025Updated last year
- The raw UserRL repo under construction☆95Sep 25, 2025Updated 5 months ago
- Benchmark of complex, multimodal desktop-oriented tasks for advanced GUI-navigation AI agents☆24May 7, 2025Updated 9 months ago
- ☆21May 3, 2025Updated 9 months ago
- Reasoning Agentic Retrieval-Augmented Generation for Industry Challenges☆28May 14, 2025Updated 9 months ago
- ☆26Jul 8, 2025Updated 7 months ago
- The paper list of multilingual pre-trained models (Continual Updated).☆24Jun 18, 2024Updated last year
- [AAAI 2026] SlideTailor: Personalized Presentation Slide Generation for Scientific Papers☆43Jan 1, 2026Updated 2 months ago
- ☆27Jan 22, 2025Updated last year
- ☆20Apr 24, 2024Updated last year
- ☆37Aug 20, 2025Updated 6 months ago
- This repository contains the code and data for the paper "VisOnlyQA: Large Vision Language Models Still Struggle with Visual Perception o…☆28Jul 9, 2025Updated 7 months ago
- OmniZip: Audio-Guided Dynamic Token Compression for Fast Omnimodal Large Language Models☆55Feb 1, 2026Updated last month
- A Text2SQL benchmark for evaluation of Large Language Models☆41Updated this week
- G1: Bootstrapping Perception and Reasoning Abilities of Vision-Language Model via Reinforcement Learning☆98May 20, 2025Updated 9 months ago
- [CVPR2026] VideoAuto-R1: Video Auto Reasoning via Thinking Once, Answering Twice☆65Updated this week
- AgentSynth: Scalable Task Generation for Generalist Computer-Use Agents☆37Oct 7, 2025Updated 4 months ago
- Official Repository for paper "HERMES: KV Cache as Hierarchical Memory for Efficient Streaming Video Understanding"☆57Jan 23, 2026Updated last month
- PhysVLM: Enabling Visual Language Models to Understand Robotic Physical Reachability☆37Mar 18, 2025Updated 11 months ago
- Official Repository of paper: "MotionEdit: Benchmarking and Learning Motion-Centric Image Editing"☆59Jan 20, 2026Updated last month
- Code for paper Empowering Large Language Model Agents through Action Learning☆33Aug 8, 2024Updated last year
- A Lightweight Visual Reasoning Benchmark for Evaluating Large Multimodal Models through Complex Diagrams in Coding Tasks☆14Feb 25, 2025Updated last year
- ☆18Jun 10, 2025Updated 8 months ago
- [NeurIPS ENLSP Workshop'24] CSKV: Training-Efficient Channel Shrinking for KV Cache in Long-Context Scenarios☆16Oct 18, 2024Updated last year
- ☆73May 23, 2025Updated 9 months ago
- ☆55Jun 4, 2025Updated 8 months ago
- ☆34May 9, 2025Updated 9 months ago
- R1-like Computer-use Agent☆89Mar 21, 2025Updated 11 months ago
- Your command-line, context-aware chatbot for instant codebase insights & more ✨☆16May 30, 2024Updated last year