Unstructured-IO / pipeline-paddleocrLinks
Pipeline for converting PDFs to raw text with PaddleOCR
☆23Updated last year
Alternatives and similar repositories for pipeline-paddleocr
Users that are interested in pipeline-paddleocr are comparing it to the libraries listed below
Sorting:
- Awesome multilingual OCR toolkits based on PaddlePaddle (practical ultra lightweight OCR system, support 80+ languages recognition, provi…☆38Updated 4 months ago
- Complex data extraction and orchestration framework designed for processing unstructured documents. It integrates AI-powered document pip…☆70Updated this week
- ☆191Updated last month
- Retrieval of fully structured data made easy. Use LLMs or custom models. Specialized on PDFs and HTML files. Extensive support of tabular…☆72Updated 2 weeks ago
- ☆19Updated 5 months ago
- A library to extract the main content from html. Developed for information on LLM and for feeding data into LangChain and LlamaIndex.☆45Updated last year
- Data extraction with Donut ML model☆57Updated 11 months ago
- A library to convert Pydantic models to TypedDict☆30Updated 11 months ago
- Query Expension for Better Query Embedding using LLMs☆55Updated 5 months ago
- A python library to define and validate data types in Docling.☆164Updated last week
- GLiNER model in a FastAPI microservice.☆45Updated 7 months ago
- Embedding models from Jina AI☆62Updated last year
- Generate pydantic models from JSON Schema☆22Updated last year
- The faststream-gen library uses advanced AI to generate FastStream code from user descriptions, speeding up FastStream app development.☆49Updated last year
- Self-host llmapi server, make it really easy for accessing LLMs !☆37Updated 2 years ago
- a series of tutorials implementing rag service with BentoML and LlamaIndex☆46Updated 7 months ago
- Split and analyze text files using langchain and streamlit☆48Updated last year
- Multimodal RAG with PyMuPDF☆38Updated 10 months ago
- hotpdf is a fast PDF parsing library to extract text and find text within PDF documents built on top of pdfminer.six☆197Updated 7 months ago
- scraping and querying documents for LLMs☆23Updated 2 months ago
- Private ChatGPT/Perplexity. Securely unlocks knowledge from confidential business information.☆70Updated 9 months ago
- This repository is designed for deploying and managing server processes that handle embeddings using the Infinity Embedding model or Larg…☆23Updated 5 months ago
- LLM prompt language based on Jinja. Banks provides tools and functions to build prompts text and chat messages from generic blueprints. I…☆113Updated 3 weeks ago
- A multimodal RAG application that enables semantic search on multimedia sources like audio, video and images☆40Updated last year
- A Python pipeline tool and plugin ecosystem for processing technical documents. Process papers from arXiv, SemanticScholar, PDF, with GRO…☆52Updated 4 months ago
- Redis Queue Dashboard based on FastAPI☆107Updated last week
- Turn any OCR models into online inference API endpoint 🚀 🌖☆57Updated 4 months ago
- An open source framework for Retrieval-Augmented System (RAG) uses semantic search helps to retrieve the expected results and generate h…☆21Updated last year
- Powered by SideGuide and GPT-3☆12Updated 2 years ago
- Self-hosted version of Microsoft's OmniParser Image-to-text model☆71Updated 2 months ago