Unstructured-IO/unstructured

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/Unstructured-IO/unstructured)

Unstructured-IO / unstructured

Convert documents to structured data effortlessly. Unstructured is open-source ETL solution for transforming complex documents into clean, structured formats for language models. Visit our website to learn more about our enterprise grade Platform product for production grade workflows, partitioning, enrichments, chunking and embedding.

☆15,097

Alternatives and similar repositories for unstructured

Users that are interested in unstructured are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

run-llama / llama_index
View on GitHub
LlamaIndex is the leading document agent and OCR platform
☆50,730Updated this week
stanfordnlp / dspy
View on GitHub
DSPy: The framework for programming—not prompting—language models
☆35,904Jul 5, 2026Updated last week
vibrantlabsai / ragas
View on GitHub
Supercharge Your LLM Application Evaluations 🚀
☆14,773Feb 24, 2026Updated 4 months ago
deepset-ai / haystack
View on GitHub
Open-source AI orchestration framework for building context-engineered, production-ready LLM applications. Design modular pipelines and a…
☆25,836Jul 6, 2026Updated last week
BerriAI / litellm
View on GitHub
Python SDK, Proxy Server (AI Gateway) to call 100+ LLM APIs in OpenAI (or native) format, with cost tracking, guardrails, loadbalancing a…
☆53,080Updated this week
Bare Metal GPUs on DigitalOcean Gradient AI • Ad
Purpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
567-labs / instructor
View on GitHub
structured outputs for llms
☆13,420Jul 6, 2026Updated last week
microsoft / graphrag
View on GitHub
A modular graph-based Retrieval-Augmented Generation (RAG) system
☆34,373Updated this week
langchain-ai / langchain
View on GitHub
The agent engineering platform.
☆141,568Updated this week
microsoft / autogen
View on GitHub
A programming framework for agentic AI
☆59,623Apr 15, 2026Updated 2 months ago
mem0ai / mem0
View on GitHub
Universal memory layer for AI Agents
☆60,342Updated this week
datalab-to / marker
View on GitHub
Convert PDF to markdown + JSON quickly with high accuracy
☆37,195Jul 6, 2026Updated last week
dottxt-ai / outlines
View on GitHub
Structured Outputs
☆14,399Jul 6, 2026Updated last week
docling-project / docling
View on GitHub
Get your documents ready for gen AI
☆62,790Jul 7, 2026Updated last week
vllm-project / vllm
View on GitHub
A high-throughput and memory-efficient inference and serving engine for LLMs
☆85,665Updated this week
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
neuml / txtai
View on GitHub
💡 All-in-one AI framework for semantic search, LLM orchestration and language model workflows
☆12,705Jul 2, 2026Updated last week
chroma-core / chroma
View on GitHub
Search infrastructure for AI
☆28,708Jul 6, 2026Updated last week
guidance-ai / guidance
View on GitHub
A guidance language for controlling large language models.
☆21,656May 21, 2026Updated last month
letta-ai / letta
View on GitHub
Platform for stateful agents: AI with advanced memory that can learn and self-improve over time.
☆23,746Jul 3, 2026Updated last week
crewAIInc / crewAI
View on GitHub
Framework for orchestrating role-playing, autonomous AI agents. By fostering collaborative intelligence, CrewAI empowers agents to work t…
☆55,144Updated this week
agno-agi / agno
View on GitHub
Build, run, and manage your own agent platform.
☆41,090Updated this week
run-llama / llama_cloud_services
View on GitHub
Knowledge Agents and Management in the Cloud
☆4,251May 18, 2026Updated last month
FlowiseAI / Flowise
View on GitHub
Build AI Agents, Visually
☆54,540Jul 6, 2026Updated last week
langfuse / langfuse
View on GitHub
🪢 Open source AI engineering platform: LLM evals, observability, metrics, prompt management, playground, datasets. Integrates with OpenT…
☆30,855Updated this week
Bare Metal GPUs on DigitalOcean Gradient AI • Ad
Purpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
Chainlit / chainlit
View on GitHub
Build Conversational AI in minutes ⚡️
☆12,298Jun 11, 2026Updated last month
infiniflow / ragflow
View on GitHub
RAGFlow is a leading open-source Retrieval-Augmented Generation (RAG) engine that fuses cutting-edge RAG with Agent capabilities to creat…
☆84,783Updated this week
datalab-to / surya
View on GitHub
OCR, layout analysis, reading order, table recognition in 90+ languages
☆21,058Updated this week
Unstructured-IO / unstructured-api
View on GitHub
☆937Jun 19, 2026Updated 3 weeks ago
weaviate / weaviate
View on GitHub
Weaviate is an open-source vector database that stores both objects and vectors, allowing for the combination of vector search with struc…
☆16,550Updated this week
weaviate / Verba
View on GitHub
Retrieval Augmented Generation (RAG) chatbot powered by Weaviate
☆7,711Jun 8, 2026Updated last month
zilliztech / GPTCache
View on GitHub
Semantic cache for LLMs. Fully integrated with LangChain and llama_index.
☆8,094Jul 11, 2025Updated last year
qdrant / qdrant
View on GitHub
Qdrant - High-performance, massive-scale Vector Database and Vector Search Engine for the next generation of AI. Also available in the cl…
☆33,022Jul 8, 2026Updated last week
assafelovic / gpt-researcher
View on GitHub
An autonomous agent that conducts deep research on any data using any LLM providers
☆28,246Jul 5, 2026Updated last week
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
argilla-io / argilla
View on GitHub
Argilla is a collaboration tool for AI engineers and domain experts to build high-quality datasets
☆5,027Jun 29, 2026Updated 2 weeks ago
unslothai / unsloth
View on GitHub
Unsloth Studio is a web UI for training and running open models like Gemma 4, Qwen3.6, DeepSeek, gpt-oss locally.
☆67,875Jul 7, 2026Updated last week
FlagOpen / FlagEmbedding
View on GitHub
Retrieval and Retrieval-augmented LLMs
☆11,914Apr 22, 2026Updated 2 months ago
Cinnamon / kotaemon
View on GitHub
An open-source RAG-based tool for chatting with your documents.
☆25,536Jun 9, 2026Updated last month
huggingface / text-generation-inference
View on GitHub
Large Language Model Text Generation Inference
☆10,870Mar 21, 2026Updated 3 months ago
ShishirPatil / gorilla
View on GitHub
Gorilla: Training and Evaluating LLMs for Function Calls (Tool Calls)
☆12,942Apr 13, 2026Updated 3 months ago
milvus-io / milvus
View on GitHub
Milvus is a high-performance, cloud-native vector database built for scalable vector ANN search
☆45,198Updated this week