microsoft / OmniParser
A simple screen parsing tool towards pure vision based GUI agent
☆21,888Updated last month
Alternatives and similar repositories for OmniParser:
Users that are interested in OmniParser are comparing it to the libraries listed below
- ☆5,635Updated last week
- A GUI Agent application based on UI-TARS(Vision-Language Model) that allows you to control your computer using natural language.☆13,137Updated last week
- Finetune Qwen3, Llama 4, TTS, DeepSeek-R1 & Gemma 3 LLMs 2x faster with 70% less memory! 🦥☆38,242Updated this week
- Suna - Open Source Generalist AI Agent☆9,536Updated this week
- 🔥 Turn entire websites into LLM-ready markdown or structured data. Scrape, crawl and extract with a single API.☆37,616Updated this week
- Use your locally running AI models to assist you in your web browsing☆6,410Updated this week
- 🦉 OWL: Optimized Workforce Learning for General Multi-Agent Assistance in Real-World Task Automation☆16,131Updated this week
- Run AI Agent in your browser.☆12,794Updated this week
- Toolkit for linearizing PDFs for LLM datasets/training☆12,238Updated this week
- Fully local web research and report writing assistant☆7,237Updated last month
- A lightweight, powerful framework for multi-agent workflows☆9,840Updated this week
- 🚀 The fast, Pythonic way to build MCP servers and clients☆8,458Updated this week
- 🪄 Create rich visualizations with AI☆11,455Updated this week
- A high-performance LLM inference API and Chat UI that integrates DeepSeek R1's CoT reasoning traces with Anthropic Claude models.☆5,092Updated 3 months ago
- The official Python SDK for Model Context Protocol servers and clients☆11,467Updated this week
- A collection of MCP servers.☆46,611Updated this week
- File Parser optimised for LLM Ingestion with no loss 🧠 Parse PDFs, Docx, PPTx in a format that is ideal for LLMs.☆6,377Updated 2 months ago
- A high-quality tool for convert PDF to Markdown and JSON.一站式开源高质量数据提取工具,将PDF转换成Markdown 和JSON格式。☆32,914Updated this week
- Educational framework exploring ergonomic, lightweight multi-agent orchestration. Managed by OpenAI Solution team.☆19,718Updated last month
- OCR, layout analysis, reading order, table recognition in 90+ languages☆17,307Updated last week
- Spark-TTS Inference Code☆9,041Updated 3 weeks ago
- A Flexible Framework for Experiencing Cutting-edge LLM Inference Optimizations☆13,903Updated last week
- Qwen2.5-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.☆10,169Updated 2 weeks ago
- Python SDK, Proxy Server (LLM Gateway) to call 100+ LLM APIs in OpenAI format - [Bedrock, Azure, OpenAI, VertexAI, Cohere, Anthropic, Sag…☆21,842Updated this week
- An AI-powered research assistant that performs iterative, deep research on any topic by combining search engines, web scraping, and large…☆15,909Updated 3 weeks ago
- A modular graph-based Retrieval-Augmented Generation (RAG) system☆24,976Updated last week
- No fortress, purely open ground. OpenManus is Coming.☆45,114Updated last week
- Make websites accessible for AI agents☆58,844Updated this week
- The easiest tool for fine-tuning LLM models, synthetic data generation, and collaborating on datasets.☆3,437Updated this week
- An open source deep research clone. AI Agent that reasons large amounts of web data extracted with Firecrawl☆5,480Updated 2 months ago