Unstructured-IO / pipeline-paddleocr
Pipeline for converting PDFs to raw text with PaddleOCR
☆21Updated last year
Alternatives and similar repositories for pipeline-paddleocr:
Users that are interested in pipeline-paddleocr are comparing it to the libraries listed below
- Awesome multilingual OCR toolkits based on PaddlePaddle (practical ultra lightweight OCR system, support 80+ languages recognition, provi…☆32Updated 4 months ago
- Self-host llmapi server, make it really easy for accessing LLMs !☆36Updated last year
- An JS web client for connecting to Pipecat bots with voice and vision☆42Updated 3 weeks ago
- A text-to-speech and speech-to-text server compatible with the OpenAI API, supporting Whisper, FunASR, Bark, and CosyVoice backends.☆22Updated this week
- ☆21Updated 10 months ago
- Open-source observability for your LLM application.☆47Updated 2 weeks ago
- DocAI helps developers quickly build document, image and text processing pipelines using open source and cloud-based machine learning mod…☆19Updated 2 years ago
- This is a proof-of-concept of using an LLM to find and extract meaningful data without parsing the html too much.☆28Updated last year
- UnitEval is a benchmarking and evaluation tools for AutoDev Coder.☆11Updated last year
- Deployment a light and full OpenAI API for production with vLLM to support /v1/embeddings with all embeddings models.☆39Updated 6 months ago
- Repo to experiment with Graph RAG strategies using Kùzu☆42Updated last month
- ☆167Updated this week
- A Python library to chunk/group your texts based on semantic similarity.☆90Updated 6 months ago
- OpenAI compatible API for open source LLMs☆15Updated last year
- Data extraction with Donut ML model☆57Updated 5 months ago
- ☆16Updated 7 months ago
- ☆36Updated 8 months ago
- 🚀 Scale your RAG pipeline using Ragswift: A scalable centralized embeddings management platform☆37Updated 11 months ago
- LitePali is a minimal, efficient implementation of ColPali for image retrieval and indexing, optimized for cloud deployment.☆34Updated 3 months ago
- ☆20Updated 4 months ago
- ☆109Updated this week
- Generate pydantic models from JSON Schema☆21Updated last year
- A python library to define and validate data types in Docling.☆56Updated this week
- DocLLM: A layout-aware generative language model for multimodal document understanding☆119Updated last year
- Easy to deploy.A cloud service for python code interpreter sandbox for Code-Interpreter.☆48Updated 10 months ago
- ☆52Updated 11 months ago
- This repo is for handling Question Answering, especially for Multi-hop Question Answering☆66Updated last year
- Repository for deepdoctection tutorial notebooks☆40Updated last month
- This project enhances the construction of RAG applications by addressing challenges, improving accessibility, scalability, and managing d…☆141Updated 9 months ago