Unstructured-IO / pipeline-paddleocrLinks
Pipeline for converting PDFs to raw text with PaddleOCR
☆23Updated last year
Alternatives and similar repositories for pipeline-paddleocr
Users that are interested in pipeline-paddleocr are comparing it to the libraries listed below
Sorting:
- Awesome multilingual OCR toolkits based on PaddlePaddle (practical ultra lightweight OCR system, support 80+ languages recognition, provi…☆38Updated 4 months ago
- Complex data extraction and orchestration framework designed for processing unstructured documents. It integrates AI-powered document pip…☆70Updated this week
- ☆187Updated 2 weeks ago
- A microframework for creating simple AI agents.☆90Updated 11 months ago
- Data extraction with Donut ML model☆57Updated 11 months ago
- DocLLM: A layout-aware generative language model for multimodal document understanding☆126Updated last year
- 📃 A contracts clause summarization system using LLM and vector database☆18Updated 4 months ago
- a series of tutorials implementing rag service with BentoML and LlamaIndex☆44Updated 6 months ago
- Excel spreadsheet crawler and table parser for data extraction and querying☆148Updated 4 months ago
- ☆49Updated last year
- Repository for deepdoctection tutorial notebooks☆45Updated 3 weeks ago
- hotpdf is a fast PDF parsing library to extract text and find text within PDF documents built on top of pdfminer.six☆196Updated 7 months ago
- Multimodal RAG with PyMuPDF☆36Updated 9 months ago
- LitePali is a minimal, efficient implementation of ColPali for image retrieval and indexing, optimized for cloud deployment.☆52Updated 9 months ago
- Private ChatGPT/Perplexity. Securely unlocks knowledge from confidential business information.☆67Updated 9 months ago
- LLM prompt language based on Jinja. Banks provides tools and functions to build prompts text and chat messages from generic blueprints. I…☆105Updated 2 weeks ago
- Fully working applications that demonstrate how to use Haystack to implement various use cases☆121Updated 3 months ago
- Build reliable, secure, and production-ready AI apps easily.☆74Updated this week
- Self-host llmapi server, make it really easy for accessing LLMs !☆37Updated 2 years ago
- ☆19Updated 5 months ago
- This repo is for handling Question Answering, especially for Multi-hop Question Answering☆67Updated last year
- Web Interface for Vision Language Models Including InternVLM2☆22Updated 11 months ago
- ☆122Updated 4 months ago
- A Python pipeline tool and plugin ecosystem for processing technical documents. Process papers from arXiv, SemanticScholar, PDF, with GRO…☆51Updated 4 months ago
- An enterprise-grade AI retriever designed to streamline AI integration into your applications, ensuring cutting-edge accuracy.☆285Updated 3 weeks ago
- A python library to define and validate data types in Docling.☆152Updated last week
- A simple Next.js frontend to explore your local weaviate collections and data☆29Updated last month
- Ready-to-go containerized RAG service. Implemented with text-embedding-inference + Qdrant/LanceDB.☆67Updated 6 months ago
- Deployment a light and full OpenAI API for production with vLLM to support /v1/embeddings with all embeddings models.☆42Updated last year
- A Python library to chunk/group your texts based on semantic similarity.☆97Updated last year