Unstructured-IO / pipeline-paddleocrLinks
Pipeline for converting PDFs to raw text with PaddleOCR
☆23Updated 2 years ago
Alternatives and similar repositories for pipeline-paddleocr
Users that are interested in pipeline-paddleocr are comparing it to the libraries listed below
Sorting:
- Awesome multilingual OCR toolkits based on PaddlePaddle (practical ultra lightweight OCR system, support 80+ languages recognition, provi…☆41Updated 10 months ago
- Complex data extraction and orchestration framework designed for processing unstructured documents. It integrates AI-powered document pip…☆80Updated this week
- ☆201Updated last week
- ☆20Updated last year
- Open-source observability for your LLM application.☆53Updated last year
- Data extraction with Donut ML model☆57Updated last year
- Application configuration and scripts for search on https://docs.vespa.ai/☆12Updated 3 weeks ago
- Self-host llmapi server, make it really easy for accessing LLMs !☆37Updated 2 years ago
- Excel spreadsheet crawler and table parser for data extraction and querying☆164Updated 11 months ago
- An open-source cloud-native of large multi-modal models (LMMs) serving framework.☆165Updated 2 years ago
- Open source libraries and APIs to build custom preprocessing pipelines for labeling, training, or production machine learning pipelines.☆29Updated 2 years ago
- Multimodal RAG with PyMuPDF☆43Updated last year
- A simple Next.js frontend to explore your local weaviate collections and data☆40Updated last week
- Using LlamaIndex, Redis, and OpenAI to chat with PDF documents. Supplementary material for blog post on Microsoft Developer Blog☆114Updated 2 years ago
- Ready-to-go containerized RAG service. Implemented with text-embedding-inference + Qdrant/LanceDB.☆74Updated last year
- Docling core data types and transformations☆225Updated this week
- Demo example of consumer goods categorization☆30Updated 2 years ago
- Build document-native LLM applications☆56Updated last year
- Embedding models from Jina AI☆65Updated 2 years ago
- Private ChatGPT/Perplexity. Securely unlocks knowledge from confidential business information.☆77Updated last year
- Repository for deepdoctection tutorial notebooks☆50Updated last month
- The easiest and most comprehensive framework for building enterprise-grade NL2SQL solutions at scale.☆47Updated last year
- 🚀 A list of Haystack Integrations, maintained by the community or deepset.☆99Updated last week
- ☆40Updated 2 years ago
- ☆125Updated 11 months ago
- A Python library to chunk/group your texts based on semantic similarity.☆103Updated last year
- This repository is designed for deploying and managing server processes that handle embeddings using the Infinity Embedding model or Larg…☆26Updated 11 months ago
- simplifies the process of creating and managing LLM workflows.☆113Updated last year
- CLIP as a service - Embed image and sentences, object recognition, visual reasoning, image classification and reverse image search☆66Updated 6 months ago
- VerifAI initiative to build open-source easy-to-deploy generative question-answering engine that can reference and verify answers for cor…☆76Updated 4 months ago