microsoft / OmniParser
A simple screen parsing tool towards pure vision based GUI agent
โ21,511Updated 3 weeks ago
Alternatives and similar repositories for OmniParser:
Users that are interested in OmniParser are comparing it to the libraries listed below
- A GUI Agent application based on UI-TARS(Vision-Language Model) that allows you to control your computer using natural language.โ11,230Updated this week
- ๐ฆ OWL: Optimized Workforce Learning for General Multi-Agent Assistance in Real-World Task Automationโ15,516Updated this week
- Make websites accessible for AI agentsโ55,383Updated this week
- No fortress, purely open ground. OpenManus is Coming.โ43,151Updated this week
- Toolkit for linearizing PDFs for LLM datasets/trainingโ11,088Updated this week
- ๐ฅ Turn entire websites into LLM-ready markdown or structured data. Scrape, crawl and extract with a single API.โ35,829Updated this week
- A collection of MCP servers.โ38,506Updated this week
- OCR & Document Extraction using vision modelsโ10,962Updated this week
- Model Context Protocol Serversโ36,433Updated this week
- Let AI be your browser operator.โ7,917Updated this week
- โ3,904Updated 2 months ago
- A Flexible Framework for Experiencing Cutting-edge LLM Inference Optimizationsโ13,566Updated this week
- ๐๐ค Crawl4AI: Open-source LLM Friendly Web Crawler & Scraper. Don't be shy, join here: https://discord.gg/jP8KfhDhyNโ39,156Updated this week
- A high-throughput and memory-efficient inference and serving engine for LLMsโ44,418Updated this week
- A lightweight, powerful framework for multi-agent workflowsโ8,597Updated this week
- Open Source Deep Research Alternative to Reason and Search on Private Data. Written in Python.โ5,424Updated last week
- Automate browser-based workflows with LLMs and Computer Visionโ12,961Updated this week
- Autonomous coding agent right in your IDE, capable of creating/editing files, executing commands, using the browser, and more with your pโฆโ40,217Updated this week
- ๐ช Create rich visualizations with AIโ11,240Updated this week
- Finetune Llama 4, DeepSeek-R1, Gemma 3 & Reasoning LLMs 2x faster with 70% less memory! ๐ฆฅโ36,949Updated last week
- A modular graph-based Retrieval-Augmented Generation (RAG) systemโ24,532Updated this week
- The official Python SDK for Model Context Protocol servers and clientsโ9,284Updated this week
- MiniCPM-o 2.6: A GPT-4o Level MLLM for Vision, Speech and Multimodal Live Streaming on Your Phoneโ19,202Updated last month
- A high-performance LLM inference API and Chat UI that integrates DeepSeek R1's CoT reasoning traces with Anthropic Claude models.โ5,038Updated 2 months ago
- ๐๐ ใๅคงๆจกๅใ2ๅฐๆถๅฎๅ จไป0่ฎญ็ป26M็ๅฐๅๆฐGPT๏ผ๐ Train a 26M-parameter GPT from scratch in just 2h!โ19,109Updated last week
- The all-in-one Desktop & Docker AI application with built-in RAG, AI agents, No-code agent builder, MCP compatibility, and more.โ42,818Updated this week
- Integrate the DeepSeek API into popular softwaresโ31,435Updated last week
- Open-Source Chrome extension for AI-powered web automation. Run multi-agent workflows using your own LLM API key. Alternative to OpenAI Oโฆโ5,040Updated this week
- SGLang is a fast serving framework for large language models and vision language models.โ13,215Updated this week
- Python scraper based on AIโ19,047Updated this week