Convert documents to structured data effortlessly. Unstructured is open-source ETL solution for transforming complex documents into clean, structured formats for language models. Visit our website to learn more about our enterprise grade Platform product for production grade workflows, partitioning, enrichments, chunking and embedding.
☆14,383Apr 3, 2026Updated this week
Alternatives and similar repositories for unstructured
Users that are interested in unstructured are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- LlamaIndex is the leading document agent and OCR platform☆48,180Updated this week
- DSPy: The framework for programming—not prompting—language models☆33,275Mar 27, 2026Updated last week
- Supercharge Your LLM Application Evaluations 🚀☆13,195Feb 24, 2026Updated last month
- Open-source AI orchestration framework for building context-engineered, production-ready LLM applications. Design modular pipelines and a…☆24,642Mar 27, 2026Updated last week
- Python SDK, Proxy Server (AI Gateway) to call 100+ LLM APIs in OpenAI (or native) format, with cost tracking, guardrails, loadbalancing a…☆41,858Updated this week
- Wordpress hosting with auto-scaling on Cloudways • AdFully Managed hosting built for WordPress-powered businesses that need reliable, auto-scalable hosting. Cloudways SafeUpdates now available.
- A modular graph-based Retrieval-Augmented Generation (RAG) system☆31,971Updated this week
- structured outputs for llms☆12,630Updated this week
- The agent engineering platform☆132,305Updated this week
- Universal memory layer for AI Agents☆51,533Updated this week
- A programming framework for agentic AI☆56,603Updated this week
- Convert PDF to markdown + JSON quickly with high accuracy☆33,138Mar 10, 2026Updated 3 weeks ago
- Get your documents ready for gen AI☆56,773Updated this week
- Structured Outputs☆13,608Mar 26, 2026Updated last week
- A high-throughput and memory-efficient inference and serving engine for LLMs☆74,805Updated this week
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting with the flexibility to host WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Cloudways by DigitalOcean.
- 💡 All-in-one AI framework for semantic search, LLM orchestration and language model workflows☆12,351Mar 27, 2026Updated last week
- Data infrastructure for AI☆27,029Updated this week
- A guidance language for controlling large language models.☆21,365Mar 18, 2026Updated 2 weeks ago
- Framework for orchestrating role-playing, autonomous AI agents. By fostering collaborative intelligence, CrewAI empowers agents to work t…☆47,739Updated this week
- Letta is the platform for building stateful agents: AI with advanced memory that can learn and self-improve over time.☆21,879Updated this week
- Build, run, manage agentic software at scale.☆39,153Updated this week
- 🪢 Open source LLM engineering platform: LLM Observability, metrics, evals, prompt management, playground, datasets. Integrates with Open…☆24,270Updated this week
- Knowledge Agents and Management in the Cloud☆4,248Mar 25, 2026Updated last week
- Build AI Agents, Visually☆51,204Mar 27, 2026Updated last week
- DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- RAGFlow is a leading open-source Retrieval-Augmented Generation (RAG) engine that fuses cutting-edge RAG with Agent capabilities to creat…☆77,002Updated this week
- Build Conversational AI in minutes ⚡️☆11,826Mar 28, 2026Updated last week
- OCR, layout analysis, reading order, table recognition in 90+ languages☆19,538Mar 1, 2026Updated last month
- ☆899Mar 27, 2026Updated last week
- Weaviate is an open-source vector database that stores both objects and vectors, allowing for the combination of vector search with struc…☆15,936Updated this week
- Retrieval Augmented Generation (RAG) chatbot powered by Weaviate☆7,620Jul 14, 2025Updated 8 months ago
- Semantic cache for LLMs. Fully integrated with LangChain and llama_index.☆7,970Jul 11, 2025Updated 8 months ago
- Qdrant - High-performance, massive-scale Vector Database and Vector Search Engine for the next generation of AI. Also available in the cl…☆29,939Updated this week
- An autonomous agent that conducts deep research on any data using any LLM providers☆26,202Mar 14, 2026Updated 3 weeks ago
- End-to-end encrypted email - Proton Mail • AdSpecial offer: 40% Off Yearly / 80% Off First Month. All Proton services are open source and independently audited for security.
- Unsloth Studio is a web UI for training and running open models like Qwen, DeepSeek, gpt-oss and Gemma locally.☆58,639Updated this week
- Argilla is a collaboration tool for AI engineers and domain experts to build high-quality datasets☆4,913Updated this week
- Retrieval and Retrieval-augmented LLMs☆11,479Mar 27, 2026Updated last week
- An open-source RAG-based tool for chatting with your documents.☆25,237Mar 28, 2026Updated last week
- Large Language Model Text Generation Inference☆10,817Mar 21, 2026Updated 2 weeks ago
- Gorilla: Training and Evaluating LLMs for Function Calls (Tool Calls)☆12,786Mar 23, 2026Updated last week
- Milvus is a high-performance, cloud-native vector database built for scalable vector ANN search☆43,596Updated this week