microsoft / OmniParser
A simple screen parsing tool towards pure vision based GUI agent
β5,509Updated last week
Alternatives and similar repositories for OmniParser:
Users that are interested in OmniParser are comparing it to the libraries listed below
- Task-Aware Agent-driven Prompt Optimization Frameworkβ2,188Updated last week
- π A better UX for chat, writing content, and coding with LLMs.β3,443Updated 2 weeks ago
- Ingest, parse, and optimize any data format β‘οΈ from documents to multimedia β‘οΈ for enhanced compatibility with GenAI frameworksβ5,974Updated 2 months ago
- GLM-4-Voice | η«―ε°η«―δΈθ±θ―ι³ε―Ήθ―樑εβ2,565Updated last month
- π An LLM-based Multi-agent Framework of Web Search Engine (like Perplexity.ai Pro and SearchGPT)β5,719Updated last week
- π€ smolagents: a barebones library for agents. Agents write python code to call tools and orchestrate other agents.β5,197Updated this week
- β7,156Updated this week
- A language model programming library.β5,556Updated 3 weeks ago
- Flexible and powerful framework for managing multiple AI agents and handling complex conversationsβ3,835Updated this week
- LLM-powered multiagent persona simulation for imagination enhancement and business insights.β5,233Updated 2 weeks ago
- Composable building blocks to build Llama Appsβ6,036Updated this week
- g1: Using Llama-3.1 70b on Groq to create o1-like reasoning chainsβ4,147Updated last month
- Agent framework and applications built upon Qwen>=2.0, featuring Function Calling, Code Interpreter, RAG, and Chrome extension.β5,320Updated this week
- PDF to Markdown with vision modelsβ8,298Updated last month
- Parse files for optimal RAGβ3,526Updated last week
- Desktop app for prototyping and debugging LangGraph applications locally.β2,284Updated this week
- KAG is a logical form-guided reasoning and retrieval framework based on OpenSPG engine and LLMs. It is used to build logical reasoning aβ¦β4,258Updated this week
- Open source Claude Artifacts β built with Llama 3.1 405Bβ5,083Updated this week
- Official code implementation of General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Modelβ6,576Updated this week
- π₯ Open Source Browser API for AI Agents & Apps. Steel Browser is a batteries-included browser instance that lets you automate the web wiβ¦β2,898Updated last week
- File Parser optimised for LLM Ingestion with no loss π§ Parse PDFs, Docx, PPTx in a format that is ideal for LLMs.β4,966Updated this week
- A fast multimodal LLM for real-time voiceβ2,760Updated this week
- Convert any URL to an LLM-friendly input with a simple prefix https://r.jina.ai/β7,430Updated this week
- NVIDIA Ingest is an early access set of microservices for parsing hundreds of thousands of complex, messy unstructured PDFs and other entβ¦β2,022Updated this week
- tiny vision language modelβ6,732Updated this week
- π¦ CHONK your texts with Chonkie β¨ - The no-nonsense RAG chunking libraryβ2,249Updated this week
- Qwen2.5-Coder is the code version of Qwen2.5, the large language model series developed by Qwen team, Alibaba Cloud.β3,878Updated this week
- AI-Driven Browser Automation with Chrome Extensions, JavaScript, and YAML Scripts.β3,051Updated this week
- Model Context Protocol Serversβ6,938Updated this week
- The official Python SDK for Model Context Protocol servers and clientsβ1,423Updated this week