xlang-ai / OSWorld
[NeurIPS 2024] OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments
☆1,397Updated this week
Related projects ⓘ
Alternatives and complementary repositories for OSWorld
- Llama-3 agents that can browse the web by following instructions and talking to you☆1,351Updated 4 months ago
- A project structure aware autonomous software engineer aiming for autonomous program improvement. Resolved 30.67% tasks (pass@1) in SWE-b…☆2,723Updated 3 weeks ago
- Agent S: an open agentic framework that uses computers like a human☆606Updated this week
- OpenCodeInterpreter is a suite of open-source code generation systems aimed at bridging the gap between large language models and sophist…☆1,602Updated 6 months ago
- Automated Design of Agentic Systems☆1,038Updated this week
- An self-improving embodied conversational agent seamlessly integrated into the operating system to automate our daily tasks.☆1,524Updated 2 months ago
- Agentless🐱: an agentless approach to automatically solve software development problems☆723Updated last week
- The first open source Large Action Model generalist Artificial Narrow Intelligence agentic framework that controls completely human user …☆1,264Updated 5 months ago
- Windows Agent Arena (WAA) 🪟 is a scalable OS platform for testing and benchmarking of multi-modal AI agents.☆483Updated this week
- [ICML'24] SeeAct is a system for generalist web agents that autonomously carry out tasks on any given website, with a focus on large mult…☆647Updated last week
- AutoGroq is a groundbreaking tool that revolutionizes the way users interact with Autogen™ and other AI assistants. By dynamically genera…☆1,341Updated 3 months ago
- ☆896Updated 6 months ago
- Official Repo for ICML 2024 paper "Executable Code Actions Elicit Better LLM Agents" by Xingyao Wang, Yangyi Chen, Lifan Yuan, Yizhe Zhan…☆496Updated 5 months ago
- Agent driven automation starting with the web. Discord: https://discord.gg/wgNfmFuqJF☆818Updated this week
- A framework for serving and evaluating LLM routers - save LLM costs without compromising quality!☆3,256Updated 3 months ago
- Python & JS/TS SDK for running AI-generated code/code interpreting in your AI app☆1,265Updated this week
- A framework for Claude Opus to intelligently orchestrate subagents.☆4,159Updated 4 months ago
- Gemma 2B with 10M context length using Infini-attention.☆949Updated 6 months ago
- PraisonAI application combines AutoGen and CrewAI or similar frameworks into a low-code solution for building and managing multi-agent LL…☆2,287Updated last week
- The Enterprise-Grade Production-Ready Multi-Agent Orchestration Framework Join our Community: https://discord.com/servers/agora-999382051…☆1,772Updated this week
- We introduced a new model designed for the Code generation task. Its test accuracy on the HumanEval base dataset surpasses that of GPT-4 …☆814Updated 4 months ago
- Together Mixture-Of-Agents (MoA) – 65.1% on AlpacaEval with OSS models☆2,598Updated last month
- An autoagentic AGI that is self-evolving and modular.☆892Updated 2 months ago
- Python SDK for AI agent monitoring, LLM cost tracking, benchmarking, and more. Integrates with most LLMs and agent frameworks like CrewAI …☆2,111Updated this week
- ✨ Fully autonomous AI Agent that can perform complicated tasks and projects using terminal, browser, and editor.☆2,147Updated 6 months ago
- [ICML 2024] LLMCompiler: An LLM Compiler for Parallel Function Calling☆1,529Updated 4 months ago
- Code for "WebVoyager: WebVoyager: Building an End-to-End Web Agent with Large Multimodal Models"☆342Updated 8 months ago
- Code for Quiet-STaR☆651Updated 2 months ago
- ☆407Updated last month
- [ICLR 2024] SWE-bench: Can Language Models Resolve Real-world Github Issues?☆2,007Updated this week