Unstructured-IO / pipeline-paddleocr
Pipeline for converting PDFs to raw text with PaddleOCR
☆21Updated last year
Alternatives and similar repositories for pipeline-paddleocr:
Users that are interested in pipeline-paddleocr are comparing it to the libraries listed below
- Awesome multilingual OCR toolkits based on PaddlePaddle (practical ultra lightweight OCR system, support 80+ languages recognition, provi…☆33Updated 6 months ago
- LitePali is a minimal, efficient implementation of ColPali for image retrieval and indexing, optimized for cloud deployment.☆41Updated 5 months ago
- An JS web client for connecting to Pipecat bots with voice and vision☆43Updated 2 months ago
- ☆22Updated 11 months ago
- Build reliable, secure, and production-ready AI apps easily.☆65Updated last week
- ☆174Updated last week
- Elasticsearch integration into LangChain☆55Updated last month
- Repository for deepdoctection tutorial notebooks☆43Updated 3 months ago
- Private ChatGPT/Perplexity. Securely unlocks knowledge from confidential business information.☆62Updated 5 months ago
- 🦦 weasel: A small and easy workflow system☆75Updated 8 months ago
- ☆59Updated 11 months ago
- Data extraction with Donut ML model☆57Updated 7 months ago
- Efficient few-shot learning with cross-encoders.☆50Updated last year
- This project enhances the construction of RAG applications by addressing challenges, improving accessibility, scalability, and managing d…☆142Updated 11 months ago
- Multimodal LLM Application with PyMuPDF4LLM☆35Updated 5 months ago
- Deployment a light and full OpenAI API for production with vLLM to support /v1/embeddings with all embeddings models.☆41Updated 7 months ago
- Dynamic Metadata based RAG Framework☆72Updated 7 months ago
- Lightweight Non-Parametric Embedding Fine-Tuning☆23Updated 5 months ago
- GLiNER model in a FastAPI microservice.☆39Updated 3 months ago
- A library to extract the main content from html. Developed for information on LLM and for feeding data into LangChain and LlamaIndex.☆34Updated 9 months ago
- Code for evaluating with Flow-Judge-v0.1 - an open-source, lightweight (3.8B) language model optimized for LLM system evaluations. Crafte…☆63Updated 4 months ago
- DocLLM: A layout-aware generative language model for multimodal document understanding☆123Updated last year
- This repository presents the original implementation of LumberChunker: Long-Form Narrative Document Segmentation by André V. Duarte, João…☆56Updated 5 months ago
- Experimental Code for StructuredRAG: JSON Response Formatting with Large Language Models☆104Updated 3 months ago
- Keyword Extraction and Analysis Pipeline & Application with KeyBERT and Taipy☆17Updated last year
- ☆118Updated 2 weeks ago
- Explore the use of DSPy for extracting features from PDFs 🔎☆38Updated last year
- A new novel multi-modality (Vision) RAG architecture☆23Updated 5 months ago
- A Python library to chunk/group your texts based on semantic similarity.☆94Updated 8 months ago