Unstructured-IO / unstructured.PaddleOCR
Awesome multilingual OCR toolkits based on PaddlePaddle (practical ultra lightweight OCR system, support 80+ languages recognition, provide data annotation and synthesis tools, support training and deployment among server, mobile, embedded and IoT devices)
β31Updated 5 months ago
Alternatives and similar repositories for unstructured.PaddleOCR:
Users that are interested in unstructured.PaddleOCR are comparing it to the libraries listed below
- Self-host LLMs with vLLM and BentoMLβ87Updated this week
- Pipeline for converting PDFs to raw text with PaddleOCRβ21Updated last year
- π Scale your RAG pipeline using Ragswift: A scalable centralized embeddings management platformβ37Updated last year
- Official code implementation of General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Modelβ22Updated 4 months ago
- Build document-native LLM applicationsβ51Updated 5 months ago
- β22Updated 8 months ago
- β16Updated 9 months ago
- Data Questionnaire Agent Chatbotβ64Updated last week
- β19Updated 3 weeks ago
- Embedding models from Jina AIβ58Updated last year
- Natural Language Interfaces Powered by LLMsβ91Updated 6 months ago
- β38Updated last year
- Leveraging DSPy for AI-driven task understanding and solution generation, the Self-Discover Framework automates problem-solving through rβ¦β58Updated 7 months ago
- Own your AI, search the web with itππβ79Updated last month
- Routing on Random Forest (RoRF)β114Updated 4 months ago
- Python API for https://vespa.ai, the open big data serving engineβ113Updated this week
- The Swarm Ecosystemβ19Updated 6 months ago
- Experimental Code for StructuredRAG: JSON Response Formatting with Large Language Modelsβ100Updated 2 months ago
- π³ AyaMCooking is a Voice-to-Voice Mutli-lingual RAG Agent that makes a perfect sous chef for your kitchen, in upto 10 Languages π€π§βπ³β21Updated 3 months ago
- Ready-to-go containerized RAG service. Implemented with text-embedding-inference + Qdrant/LanceDB.β57Updated last month
- A multimodal RAG application that enables semantic search on multimedia sources like audio, video and imagesβ31Updated last year
- Code for evaluating with Flow-Judge-v0.1 - an open-source, lightweight (3.8B) language model optimized for LLM system evaluations. Crafteβ¦β59Updated 3 months ago
- large language model for mastering data analysis using pandasβ46Updated last year
- Nougat is a Meta AI's revolutionary OCR model designed to transcribe scientific PDFs into an easy-to-use Markdown format.β22Updated last year
- LangEvals aggregates various language model evaluators into a single platform, providing a standard interface for a multitude of scores aβ¦β44Updated this week
- AI search: your data + 10 lines of code.β75Updated 6 months ago
- Simple Implementation of a Transformer in the new framework MLX by Appleβ20Updated 3 months ago
- Browser-based Voice Assistantβ44Updated last year
- Retrieval of fully structured data made easy. Use LLMs or custom models. Specialized on PDFs and HTML files. Extensive support of tabularβ¦β66Updated 2 weeks ago