Unstructured-IO / pipeline-paddleocrLinks
Pipeline for converting PDFs to raw text with PaddleOCR
☆23Updated 2 years ago
Alternatives and similar repositories for pipeline-paddleocr
Users that are interested in pipeline-paddleocr are comparing it to the libraries listed below
Sorting:
- Awesome multilingual OCR toolkits based on PaddlePaddle (practical ultra lightweight OCR system, support 80+ languages recognition, provi…☆38Updated 5 months ago
- Complex data extraction and orchestration framework designed for processing unstructured documents. It integrates AI-powered document pip…☆72Updated this week
- ☆191Updated last week
- Split and analyze text files using langchain and streamlit☆48Updated last year
- Retrieval of fully structured data made easy. Use LLMs or custom models. Specialized on PDFs and HTML files. Extensive support of tabular…☆73Updated last month
- Unattended Lightweight Text Classifiers with LLM Embeddings☆184Updated 11 months ago
- Repository for deepdoctection tutorial notebooks☆46Updated 2 months ago
- Private ChatGPT/Perplexity. Securely unlocks knowledge from confidential business information.☆71Updated 10 months ago
- ☆122Updated 6 months ago
- hotpdf is a fast PDF parsing library to extract text and find text within PDF documents built on top of pdfminer.six☆196Updated 8 months ago
- Data extraction with Donut ML model☆56Updated last year
- ☆19Updated 6 months ago
- Open source libraries and APIs to build custom preprocessing pipelines for labeling, training, or production machine learning pipelines.☆29Updated 2 years ago
- A Python pipeline tool and plugin ecosystem for processing technical documents. Process papers from arXiv, SemanticScholar, PDF, with GRO…☆52Updated 5 months ago
- Repo to experiment with Graph RAG strategies using Kùzu☆57Updated 8 months ago
- An JS web client for connecting to Pipecat bots with voice and vision☆45Updated 8 months ago
- Build document-native LLM applications☆54Updated 11 months ago
- A library to extract the main content from html. Developed for information on LLM and for feeding data into LangChain and LlamaIndex.☆45Updated last year
- An open source framework for Retrieval-Augmented System (RAG) uses semantic search helps to retrieve the expected results and generate h…☆21Updated last year
- This repository is designed for deploying and managing server processes that handle embeddings using the Infinity Embedding model or Larg…☆24Updated 5 months ago
- Embedding models from Jina AI☆64Updated last year
- LLM prompt language based on Jinja. Banks provides tools and functions to build prompts text and chat messages from generic blueprints. I…☆114Updated last month
- Efficient, consistent and secure library for querying structured data with natural language☆160Updated 4 months ago
- A Python library to chunk/group your texts based on semantic similarity.☆97Updated last year
- Excel spreadsheet crawler and table parser for data extraction and querying☆153Updated 5 months ago
- A simple Next.js frontend to explore your local weaviate collections and data☆33Updated 2 months ago
- A multimodal RAG application that enables semantic search on multimedia sources like audio, video and images☆40Updated last year
- Self-host llmapi server, make it really easy for accessing LLMs !☆37Updated 2 years ago
- A cookiecutter template for building plugins for LLM☆28Updated 4 months ago
- Python API for https://vespa.ai, the open big data serving engine☆137Updated this week