Unstructured-IO / pipeline-paddleocrLinks
Pipeline for converting PDFs to raw text with PaddleOCR
☆23Updated 2 years ago
Alternatives and similar repositories for pipeline-paddleocr
Users that are interested in pipeline-paddleocr are comparing it to the libraries listed below
Sorting:
- Awesome multilingual OCR toolkits based on PaddlePaddle (practical ultra lightweight OCR system, support 80+ languages recognition, provi…☆39Updated 6 months ago
- Complex data extraction and orchestration framework designed for processing unstructured documents. It integrates AI-powered document pip…☆72Updated this week
- Retrieval of fully structured data made easy. Use LLMs or custom models. Specialized on PDFs and HTML files. Extensive support of tabular…☆74Updated last month
- ☆193Updated this week
- a series of tutorials implementing rag service with BentoML and LlamaIndex☆48Updated 9 months ago
- Web Interface for Vision Language Models Including InternVLM2☆23Updated last year
- Excel spreadsheet crawler and table parser for data extraction and querying☆159Updated 7 months ago
- Data extraction with Donut ML model☆57Updated last year
- Open-source observability for your LLM application.☆52Updated 9 months ago
- Demo example of consumer goods categorization☆28Updated last year
- Private ChatGPT/Perplexity. Securely unlocks knowledge from confidential business information.☆72Updated 11 months ago
- Generate pydantic models from JSON Schema☆23Updated 2 years ago
- ☆124Updated 7 months ago
- A library to extract the main content from html. Developed for information on LLM and for feeding data into LangChain and LlamaIndex.☆49Updated last year
- Open source libraries and APIs to build custom preprocessing pipelines for labeling, training, or production machine learning pipelines.☆29Updated 2 years ago
- A python library to define and validate data types in Docling.☆185Updated this week
- DocLLM: A layout-aware generative language model for multimodal document understanding☆129Updated last year
- simplifies the process of creating and managing LLM workflows.☆108Updated 11 months ago
- ☆19Updated 8 months ago
- hotpdf is a fast PDF parsing library to extract text and find text within PDF documents built on top of pdfminer.six☆196Updated 9 months ago
- Embedding models from Jina AI☆65Updated last year
- A tool to OCR PDFs using gen-AI models☆44Updated 3 months ago
- Split and analyze text files using langchain and streamlit☆48Updated last year
- Local Ollama with Qdrant RAG: Embed, index, and enhance models for retrieval-augmented generation. Get started with easy setup for powerf…☆21Updated last year
- A Prodigy plugin for PDF annotation☆35Updated last month
- Demo app with Loguru logging, async middleware to generate X-request-Id. Works with Gunicorn or Uvicorn, and is safe to use with async/th…☆10Updated 3 years ago
- Deployment a light and full OpenAI API for production with vLLM to support /v1/embeddings with all embeddings models.☆43Updated last year
- LLM Agents: Landing Page Generation for an E-commerce Platform Using CrewAI, Groq-LangChain and Qdrant☆14Updated last year
- Multi-threaded matrix multiplication and cosine similarity calculations for dense and sparse matrices. Appropriate for calculating the K …☆83Updated 9 months ago
- Hybrid Search (BM25 & Vector) with SQLite☆23Updated last year