Unstructured-IO / pipeline-paddleocrLinks
Pipeline for converting PDFs to raw text with PaddleOCR
☆23Updated 2 years ago
Alternatives and similar repositories for pipeline-paddleocr
Users that are interested in pipeline-paddleocr are comparing it to the libraries listed below
Sorting:
- Awesome multilingual OCR toolkits based on PaddlePaddle (practical ultra lightweight OCR system, support 80+ languages recognition, provi…☆41Updated 9 months ago
- Complex data extraction and orchestration framework designed for processing unstructured documents. It integrates AI-powered document pip…☆79Updated this week
- ☆20Updated 10 months ago
- ☆201Updated last week
- Open-source observability for your LLM application.☆53Updated 11 months ago
- Private ChatGPT/Perplexity. Securely unlocks knowledge from confidential business information.☆77Updated last year
- Self-host llmapi server, make it really easy for accessing LLMs !☆37Updated 2 years ago
- Data extraction with Donut ML model☆57Updated last year
- A Python library to chunk/group your texts based on semantic similarity.☆101Updated last year
- Retrieval of fully structured data made easy. Use LLMs or custom models. Specialized on PDFs and HTML files. Extensive support of tabular…☆79Updated this week
- A library to extract the main content from html. Developed for information on LLM and for feeding data into LangChain and LlamaIndex.☆52Updated last year
- Unattended Lightweight Text Classifiers with LLM Embeddings☆187Updated last year
- Local Ollama with Qdrant RAG: Embed, index, and enhance models for retrieval-augmented generation. Get started with easy setup for powerf…☆24Updated last year
- Excel spreadsheet crawler and table parser for data extraction and querying☆164Updated 9 months ago
- An JS web client for connecting to Pipecat bots with voice and vision☆44Updated last year
- The easiest and most comprehensive framework for building enterprise-grade NL2SQL solutions at scale.☆46Updated last year
- ☆66Updated last year
- hotpdf is a fast PDF parsing library to extract text and find text within PDF documents built on top of pdfminer.six☆197Updated last year
- ☆40Updated 2 years ago
- ☆125Updated 10 months ago
- Multimodal RAG with PyMuPDF☆43Updated last year
- DocLLM: A layout-aware generative language model for multimodal document understanding☆131Updated last year
- Keyword Extraction and Analysis Pipeline & Application with KeyBERT and Taipy☆16Updated 2 years ago
- GLiNER model in a FastAPI microservice.☆47Updated last year
- A simple Next.js frontend to explore your local weaviate collections and data☆39Updated 6 months ago
- Ready-to-go containerized RAG service. Implemented with text-embedding-inference + Qdrant/LanceDB.☆73Updated last year
- A tool to OCR PDFs using gen-AI models☆45Updated last week
- A set of tools to create synthetically-generated data from documents☆39Updated 4 months ago
- Split and analyze text files using langchain and streamlit☆50Updated last year
- Generate pydantic models from JSON Schema☆23Updated 2 years ago