Unstructured-IO / pipeline-paddleocrLinks
Pipeline for converting PDFs to raw text with PaddleOCR
☆23Updated last year
Alternatives and similar repositories for pipeline-paddleocr
Users that are interested in pipeline-paddleocr are comparing it to the libraries listed below
Sorting:
- Awesome multilingual OCR toolkits based on PaddlePaddle (practical ultra lightweight OCR system, support 80+ languages recognition, provi…☆37Updated 2 months ago
- A library to extract the main content from html. Developed for information on LLM and for feeding data into LangChain and LlamaIndex.☆40Updated last year
- ☆22Updated last year
- A Python pipeline tool and plugin ecosystem for processing technical documents. Process papers from arXiv, SemanticScholar, PDF, with GRO…☆50Updated 2 months ago
- Data extraction with Donut ML model☆57Updated 9 months ago
- Self-host llmapi server, make it really easy for accessing LLMs !☆37Updated 2 years ago
- Elasticsearch integration into LangChain☆57Updated 3 months ago
- A multimodal RAG application that enables semantic search on multimedia sources like audio, video and images☆39Updated last year
- Application configuration and scripts for search on https://docs.vespa.ai/☆12Updated this week
- LitePali is a minimal, efficient implementation of ColPali for image retrieval and indexing, optimized for cloud deployment.☆51Updated 8 months ago
- Query Expension for Better Query Embedding using LLMs☆51Updated 3 months ago
- ☆18Updated 3 months ago
- ☆20Updated this week
- Repository for deepdoctection tutorial notebooks☆45Updated 6 months ago
- A python library to define and validate data types in Docling.☆137Updated last week
- ☆183Updated this week
- ☆122Updated 3 months ago
- DocLLM: A layout-aware generative language model for multimodal document understanding☆126Updated last year
- Create fast graph language models from converted PDF documents for knowledge extraction and Q&A.☆53Updated 4 months ago
- ☆22Updated 2 months ago
- A flexible and easy to use tool for Semantic Routing☆18Updated 8 months ago
- Web Interface for Vision Language Models Including InternVLM2☆22Updated 10 months ago
- ☆32Updated last year
- Python API for https://vespa.ai, the open big data serving engine☆126Updated this week
- Demo example of consumer goods categorization☆28Updated last year
- Explore the use of DSPy for extracting features from PDFs 🔎☆40Updated last year
- Open source libraries and APIs to build custom preprocessing pipelines for labeling, training, or production machine learning pipelines.☆27Updated 2 years ago
- This repo is for handling Question Answering, especially for Multi-hop Question Answering☆67Updated last year
- Complex data extraction and orchestration framework designed for processing unstructured documents. It integrates AI-powered document pip…☆70Updated last week
- Code for evaluating with Flow-Judge-v0.1 - an open-source, lightweight (3.8B) language model optimized for LLM system evaluations. Crafte…☆70Updated 7 months ago