Unstructured-IO / pipeline-paddleocr
Pipeline for converting PDFs to raw text with PaddleOCR
☆20Updated last year
Related projects ⓘ
Alternatives and complementary repositories for pipeline-paddleocr
- ☆21Updated 7 months ago
- ☆24Updated last year
- 🚀 Scale your RAG pipeline using Ragswift: A scalable centralized embeddings management platform☆36Updated 9 months ago
- Deployment a light and full OpenAI API for production with vLLM to support /v1/embeddings with all embeddings models.☆37Updated 3 months ago
- A Python library to chunk/group your texts based on semantic similarity.☆85Updated 3 months ago
- Ready-to-go containerized RAG service. Implemented with text-embedding-inference + Qdrant/LanceDB.☆47Updated 7 months ago
- Data extraction with Donut ML model☆55Updated 2 months ago
- Efficient few-shot learning with cross-encoders.☆40Updated 8 months ago
- A fast and lightweight pure Python library for splitting text into semantically meaningful chunks.☆172Updated 3 months ago
- DocLLM: A layout-aware generative language model for multimodal document understanding☆112Updated 10 months ago
- A library to extract the main content from html. Developed for information on LLM and for feeding data into LangChain and LlamaIndex.☆21Updated 5 months ago
- This repo is for handling Question Answering, especially for Multi-hop Question Answering☆64Updated 10 months ago
- Self-host LLMs with vLLM and BentoML☆72Updated this week
- ☆31Updated 11 months ago
- Experimental Code for StructuredRAG: Structured Outputs in Retrieval-Augmented Generation☆90Updated this week
- ☆49Updated 8 months ago
- Langchain Agent utilizing OpenAI Function Calls to execute Git commands using Natural Language☆44Updated last year
- ☆105Updated last month
- Awesome multilingual OCR toolkits based on PaddlePaddle (practical ultra lightweight OCR system, support 80+ languages recognition, provi…☆30Updated 2 months ago
- Self-host llmapi server, make it really easy for accessing LLMs !☆36Updated last year
- One Line To Build Zero-Data Classifiers in Minutes☆30Updated last month
- ☆178Updated last month
- YOLO models trained by DocLayNet - power your Document Intelligent by Layout Analysis☆60Updated last month
- ☆21Updated 3 weeks ago
- An enterprise-grade AI retriever designed to streamline AI integration into your applications, ensuring cutting-edge accuracy.☆255Updated this week
- Application configuration and scripts for search on https://docs.vespa.ai/☆13Updated last week
- 🦦 weasel: A small and easy workflow system☆67Updated 4 months ago
- LitePali is a minimal, efficient implementation of ColPali for image retrieval and indexing, optimized for cloud deployment.☆18Updated last month
- Online Inference API for NLP Transformer models - summarization, text classification, sentiment analysis and more☆43Updated 7 months ago
- ☆31Updated this week