microsoft / OmniParser
A simple screen parsing tool towards pure vision based GUI agent
β4,768Updated 2 weeks ago
Related projects β
Alternatives and complementary repositories for OmniParser
- π A better UX for chat, writing content, and coding with LLMs.β2,602Updated last week
- π An LLM-based Multi-agent Framework of Web Search Engine (like Perplexity.ai Pro and SearchGPT)β5,185Updated 2 weeks ago
- PDF to Markdown with vision modelsβ6,324Updated this week
- LLM-powered multiagent persona simulation for imagination enhancement and business insights.β3,677Updated last week
- g1: Using Llama-3.1 70b on Groq to create o1-like reasoning chainsβ3,906Updated last month
- Chat first code editor. To download the packaged app:β5,124Updated last week
- GLM-4-Voice | η«―ε°η«―δΈθ±θ―ι³ε―Ήθ―樑εβ2,289Updated last week
- Qwen2.5-Coder is the code version of Qwen2.5, the large language model series developed by Qwen team, Alibaba Cloud.β2,710Updated this week
- β6,781Updated 2 weeks ago
- A language model programming library.β5,295Updated this week
- A collection of projects designed to help developers quickly get started with building deployable applications using the Anthropic APIβ6,841Updated this week
- Ingest, parse, and optimize any data format β‘οΈ from documents to multimedia β‘οΈ for enhanced compatibility with GenAI frameworksβ5,648Updated 2 weeks ago
- Build a Perplexity-Inspired Answer Engine Using Next.js, Groq, Llama-3, Langchain, OpenAI, Upstash, Brave & Serperβ4,660Updated last month
- Composable building blocks to build Llama Appsβ4,594Updated this week
- Official code implementation of General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Modelβ6,053Updated this week
- Speech To Speech: an effort for an open-sourced and modular GPT4-oβ3,540Updated 2 weeks ago
- Convert any URL to an LLM-friendly input with a simple prefix https://r.jina.ai/β6,985Updated this week
- Run PyTorch LLMs locally on servers, desktop and mobileβ3,383Updated this week
- Together Mixture-Of-Agents (MoA) β 65.1% on AlpacaEval with OSS modelsβ2,598Updated last month
- Large Action Model framework to develop AI Web Agentsβ5,477Updated this week
- π₯ Turn entire websites into LLM-ready markdown or structured data. Scrape, crawl and extract with a single API.β18,840Updated this week
- A framework for serving and evaluating LLM routers - save LLM costs without compromising quality!β3,256Updated 3 months ago
- Get your documents ready for gen AIβ9,923Updated this week
- Build real-time multimodal AI applications π€ποΈπΉβ4,010Updated this week
- An open-source RAG-based tool for chatting with your documents.β17,436Updated this week
- Open source Claude Artifacts β built with Llama 3.1 405Bβ3,555Updated this week
- Fast and accurate automatic speech recognition (ASR) for edge devicesβ2,183Updated this week
- LLaMA-Omni is a low-latency and high-quality end-to-end speech interaction model built upon Llama-3.1-8B-Instruct, aiming to achieve speeβ¦β2,571Updated last week
- RAG (Retrieval Augmented Generation) Framework for building modular, open source applications for production by TrueFoundryβ3,322Updated this week
- The easiest way to use Agentic RAG in any enterpriseβ3,866Updated this week