Unstructured-IO / pipeline-paddleocr
Pipeline for converting PDFs to raw text with PaddleOCR
☆21Updated last year
Alternatives and similar repositories for pipeline-paddleocr:
Users that are interested in pipeline-paddleocr are comparing it to the libraries listed below
- Awesome multilingual OCR toolkits based on PaddlePaddle (practical ultra lightweight OCR system, support 80+ languages recognition, provi…☆31Updated 5 months ago
- Deployment a light and full OpenAI API for production with vLLM to support /v1/embeddings with all embeddings models.☆40Updated 7 months ago
- Repository for deepdoctection tutorial notebooks☆42Updated 2 months ago
- Open-source observability for your LLM application.☆47Updated last month
- ☆22Updated 10 months ago
- A multimodal RAG application that enables semantic search on multimedia sources like audio, video and images☆31Updated last year
- Nougat is a Meta AI's revolutionary OCR model designed to transcribe scientific PDFs into an easy-to-use Markdown format.☆22Updated last year
- DocLLM: A layout-aware generative language model for multimodal document understanding☆119Updated last year
- Open source libraries and APIs to build custom preprocessing pipelines for labeling, training, or production machine learning pipelines.☆27Updated last year
- DocAI helps developers quickly build document, image and text processing pipelines using open source and cloud-based machine learning mod…☆19Updated 2 years ago
- ☆173Updated last week
- ☆19Updated 2 weeks ago
- This project enhances the construction of RAG applications by addressing challenges, improving accessibility, scalability, and managing d…☆141Updated 10 months ago
- LitePali is a minimal, efficient implementation of ColPali for image retrieval and indexing, optimized for cloud deployment.☆38Updated 4 months ago
- A Python library to chunk/group your texts based on semantic similarity.☆93Updated 7 months ago
- Easy to deploy.A cloud service for python code interpreter sandbox for Code-Interpreter.☆48Updated 11 months ago
- Experimental Code for StructuredRAG: JSON Response Formatting with Large Language Models☆100Updated 2 months ago
- ☆22Updated 5 months ago
- Efficient few-shot learning with cross-encoders.☆48Updated last year
- A library to extract the main content from html. Developed for information on LLM and for feeding data into LangChain and LlamaIndex.☆31Updated 9 months ago
- ☆57Updated 10 months ago
- Elasticsearch integration into LangChain☆53Updated last week
- Build event-driven workflows with python async functions☆32Updated 4 months ago
- ☆49Updated 7 months ago
- High level library for batched embeddings generation, blazingly-fast web-based RAG and quantized indexes processing ⚡☆64Updated 3 months ago
- Self-host LLMs with vLLM and BentoML☆86Updated this week
- DSPY on action with OpenSource LLMs.☆64Updated 10 months ago
- An JS web client for connecting to Pipecat bots with voice and vision☆43Updated last month
- ☆52Updated last year
- ☆21Updated 4 months ago