Unstructured-IO / pipeline-paddleocrLinks
Pipeline for converting PDFs to raw text with PaddleOCR
☆23Updated 2 years ago
Alternatives and similar repositories for pipeline-paddleocr
Users that are interested in pipeline-paddleocr are comparing it to the libraries listed below
Sorting:
- Complex data extraction and orchestration framework designed for processing unstructured documents. It integrates AI-powered document pip…☆76Updated last week
- Awesome multilingual OCR toolkits based on PaddlePaddle (practical ultra lightweight OCR system, support 80+ languages recognition, provi…☆40Updated 8 months ago
- Self-host llmapi server, make it really easy for accessing LLMs !☆37Updated 2 years ago
- A tool to OCR PDFs using gen-AI models☆45Updated 5 months ago
- Data extraction with Donut ML model☆57Updated last year
- ☆199Updated 2 weeks ago
- A python library to define and validate data types in Docling.☆204Updated this week
- Open source libraries and APIs to build custom preprocessing pipelines for labeling, training, or production machine learning pipelines.☆29Updated 2 years ago
- This repository is designed for deploying and managing server processes that handle embeddings using the Infinity Embedding model or Larg…☆24Updated 8 months ago
- hotpdf is a fast PDF parsing library to extract text and find text within PDF documents built on top of pdfminer.six☆196Updated 11 months ago
- Demo app with Loguru logging, async middleware to generate X-request-Id. Works with Gunicorn or Uvicorn, and is safe to use with async/th…☆10Updated 3 years ago
- An JS web client for connecting to Pipecat bots with voice and vision☆45Updated 11 months ago
- A set of tools to create synthetically-generated data from documents☆36Updated 3 months ago
- Split and analyze text files using langchain and streamlit☆50Updated last year
- Open-source observability for your LLM application.☆52Updated 10 months ago
- A library to extract the main content from html. Developed for information on LLM and for feeding data into LangChain and LlamaIndex.☆50Updated last year
- Private ChatGPT/Perplexity. Securely unlocks knowledge from confidential business information.☆77Updated last year
- A Python pipeline tool and plugin ecosystem for processing technical documents. Process papers from arXiv, SemanticScholar, PDF, with GRO…☆52Updated 8 months ago
- ☆40Updated 2 years ago
- Retrieval of fully structured data made easy. Use LLMs or custom models. Specialized on PDFs and HTML files. Extensive support of tabular…☆76Updated last month
- Embedding models from Jina AI☆65Updated last year
- Easy to deploy.A cloud service for python code interpreter sandbox for Code-Interpreter.☆56Updated last year
- POC Port of the openai-realtime-console to streamlit.☆53Updated last year
- An open-source cloud-native of large multi-modal models (LMMs) serving framework.☆164Updated 2 years ago
- A simple Next.js frontend to explore your local weaviate collections and data☆38Updated 5 months ago
- Vector search demo with the arXiv paper dataset, RedisVL, HuggingFace, OpenAI, Cohere, FastAPI, React, and Redis.☆149Updated 7 months ago
- Multimodal RAG with PyMuPDF☆43Updated last year
- ChatData 🔍 📖 brings RAG to real applications with FREE✨ knowledge bases. Now enjoy your chat with 6 million wikipedia pages and 2 milli…☆178Updated last year
- Unattended Lightweight Text Classifiers with LLM Embeddings☆184Updated last year
- This project aims to extract text from PDF files using the outputs generated by the pdf-document-layout-analysis service. By leveraging t…☆36Updated 9 months ago