microsoft / OmniParser
A simple screen parsing tool towards pure vision based GUI agent
β4,485Updated last week
Related projects β
Alternatives and complementary repositories for OmniParser
- Convert any URL to an LLM-friendly input with a simple prefix https://r.jina.ai/β6,914Updated this week
- Build real-time multimodal AI applications π€ποΈπΉβ3,929Updated this week
- Speech To Speech: an effort for an open-sourced and modular GPT4-oβ3,499Updated last week
- g1: Using Llama-3.1 70b on Groq to create o1-like reasoning chainsβ3,864Updated last month
- β6,692Updated last week
- GLM-4-Voice | η«―ε°η«―δΈθ±θ―ι³ε―Ήθ―樑εβ2,182Updated this week
- Fast and accurate automatic speech recognition (ASR) for edge devicesβ2,107Updated last week
- π An LLM-based Multi-agent Framework of Web Search Engine (like Perplexity.ai Pro and SearchGPT)β5,118Updated this week
- PDF to Markdown with vision modelsβ5,981Updated this week
- Ingest, parse, and optimize any data format β‘οΈ from documents to multimedia β‘οΈ for enhanced compatibility with GenAI frameworksβ5,490Updated last week
- Composable building blocks to build Llama Appsβ4,496Updated this week
- π A better UX for chat, writing content, and coding with LLMs.β2,502Updated this week
- Get your documents ready for gen AIβ7,698Updated this week
- The easiest way to use Agentic RAG in any enterpriseβ3,834Updated last week
- LLaMA-Omni is a low-latency and high-quality end-to-end speech interaction model built upon Llama-3.1-8B-Instruct, aiming to achieve speeβ¦β2,538Updated last month
- rewind.ai x cursor.com = your AI assistant that has all the contextβ8,803Updated this week
- Open Source framework for voice and multimodal conversational AIβ3,346Updated this week
- Official code implementation of General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Modelβ5,901Updated last week
- Automate browser-based workflows with LLMs and Computer Visionβ10,242Updated this week
- A language model programming library.β5,226Updated this week
- Inference and training library for high-quality TTS models.β4,592Updated last week
- RAG (Retrieval Augmented Generation) Framework for building modular, open source applications for production by TrueFoundryβ3,274Updated this week
- A framework for serving and evaluating LLM routers - save LLM costs without compromising quality!β3,220Updated 3 months ago
- LongWriter: Unleashing 10,000+ Word Generation from Long Context LLMsβ1,462Updated 2 weeks ago
- Official code for "F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching"β6,873Updated this week
- A collection of notebooks/recipes showcasing some fun and effective ways of using Claude.β6,775Updated last week
- OCR, layout analysis, reading order, table recognition in 90+ languagesβ13,943Updated this week
- β3,005Updated 2 weeks ago
- Build AI Agents with memory, knowledge, tools and reasoning. Chat with them using a beautiful Agent UI.β14,893Updated this week
- Retrieval Augmented Generation (RAG) chatbot powered by Weaviateβ6,265Updated this week