Unstructured-IO / pipeline-paddleocrLinks
Pipeline for converting PDFs to raw text with PaddleOCR
☆23Updated 2 years ago
Alternatives and similar repositories for pipeline-paddleocr
Users that are interested in pipeline-paddleocr are comparing it to the libraries listed below
Sorting:
- Awesome multilingual OCR toolkits based on PaddlePaddle (practical ultra lightweight OCR system, support 80+ languages recognition, provi…☆40Updated 7 months ago
- Complex data extraction and orchestration framework designed for processing unstructured documents. It integrates AI-powered document pip…☆73Updated this week
- Retrieval of fully structured data made easy. Use LLMs or custom models. Specialized on PDFs and HTML files. Extensive support of tabular…☆75Updated last week
- ☆197Updated this week
- Self-host llmapi server, make it really easy for accessing LLMs !☆37Updated 2 years ago
- Open-source observability for your LLM application.☆52Updated 9 months ago
- Easy to deploy.A cloud service for python code interpreter sandbox for Code-Interpreter.☆55Updated last year
- Data extraction with Donut ML model☆57Updated last year
- A library to extract the main content from html. Developed for information on LLM and for feeding data into LangChain and LlamaIndex.☆50Updated last year
- hotpdf is a fast PDF parsing library to extract text and find text within PDF documents built on top of pdfminer.six☆196Updated 10 months ago
- Demo app with Loguru logging, async middleware to generate X-request-Id. Works with Gunicorn or Uvicorn, and is safe to use with async/th…☆10Updated 3 years ago
- Repository for deepdoctection tutorial notebooks☆45Updated 4 months ago
- ☆65Updated last year
- Private ChatGPT/Perplexity. Securely unlocks knowledge from confidential business information.☆72Updated last year
- A Python library to chunk/group your texts based on semantic similarity.☆97Updated last year
- ☆40Updated 2 years ago
- ☆19Updated 8 months ago
- Ready-to-go containerized RAG service. Implemented with text-embedding-inference + Qdrant/LanceDB.☆73Updated 10 months ago
- A simple Next.js frontend to explore your local weaviate collections and data☆37Updated 4 months ago
- A pythonic library providing light-weighted interface with LLMs☆129Updated 5 months ago
- Taking Normal Text as Input and Generating SQL commands using the OpenAI's GPT-3☆15Updated 5 years ago
- DocLLM: A layout-aware generative language model for multimodal document understanding☆129Updated last year
- Embedding models from Jina AI☆65Updated last year
- A Prodigy plugin for PDF annotation☆35Updated 2 months ago
- Unattended Lightweight Text Classifiers with LLM Embeddings☆184Updated last year
- Excel spreadsheet crawler and table parser for data extraction and querying☆160Updated 7 months ago
- simplifies the process of creating and managing LLM workflows.☆110Updated last year
- Application configuration and scripts for search on https://docs.vespa.ai/☆12Updated this week
- ☆23Updated 7 months ago
- A python library to define and validate data types in Docling.☆198Updated this week