Unstructured-IO / pipeline-paddleocrLinks
Pipeline for converting PDFs to raw text with PaddleOCR
☆23Updated 2 years ago
Alternatives and similar repositories for pipeline-paddleocr
Users that are interested in pipeline-paddleocr are comparing it to the libraries listed below
Sorting:
- Awesome multilingual OCR toolkits based on PaddlePaddle (practical ultra lightweight OCR system, support 80+ languages recognition, provi…☆41Updated 10 months ago
- Complex data extraction and orchestration framework designed for processing unstructured documents. It integrates AI-powered document pip…☆80Updated this week
- ☆201Updated this week
- A set of tools to create synthetically-generated data from documents☆39Updated 5 months ago
- Open-source observability for your LLM application.☆53Updated last year
- Split and analyze text files using langchain and streamlit☆49Updated last year
- Retrieval of fully structured data made easy. Use LLMs or custom models. Specialized on PDFs and HTML files. Extensive support of tabular…☆81Updated 2 weeks ago
- Data extraction with Donut ML model☆57Updated last year
- ☆20Updated 11 months ago
- Private ChatGPT/Perplexity. Securely unlocks knowledge from confidential business information.☆77Updated last year
- Docling core data types and transformations☆221Updated this week
- Ready-to-go containerized RAG service. Implemented with text-embedding-inference + Qdrant/LanceDB.☆73Updated last year
- Self-host llmapi server, make it really easy for accessing LLMs !☆37Updated 2 years ago
- A Python library to chunk/group your texts based on semantic similarity.☆103Updated last year
- A simple Next.js frontend to explore your local weaviate collections and data☆39Updated 7 months ago
- An JS web client for connecting to Pipecat bots with voice and vision☆45Updated last year
- This project aims to extract text from PDF files using the outputs generated by the pdf-document-layout-analysis service. By leveraging t…☆37Updated 11 months ago
- ☆40Updated 2 years ago
- Local Ollama with Qdrant RAG: Embed, index, and enhance models for retrieval-augmented generation. Get started with easy setup for powerf…☆25Updated last year
- ☆125Updated 10 months ago
- Multimodal RAG with PyMuPDF☆43Updated last year
- Open source libraries and APIs to build custom preprocessing pipelines for labeling, training, or production machine learning pipelines.☆29Updated 2 years ago
- hotpdf is a fast PDF parsing library to extract text and find text within PDF documents built on top of pdfminer.six☆198Updated last year
- Source code of the food discovery demo built on top of Qdrant☆48Updated 2 years ago
- A Prodigy plugin for PDF annotation☆36Updated 5 months ago
- Chat language model that can interpret and execute functions/plugins☆14Updated last year
- DocLLM: A layout-aware generative language model for multimodal document understanding☆133Updated 2 years ago
- Embedding models from Jina AI☆65Updated 2 years ago
- Efficient, consistent and secure library for querying structured data with natural language☆164Updated 9 months ago
- An open-source cloud-native of large multi-modal models (LMMs) serving framework.☆165Updated 2 years ago