Unstructured-IO / pipeline-paddleocrLinks
Pipeline for converting PDFs to raw text with PaddleOCR
☆23Updated 2 years ago
Alternatives and similar repositories for pipeline-paddleocr
Users that are interested in pipeline-paddleocr are comparing it to the libraries listed below
Sorting:
- Awesome multilingual OCR toolkits based on PaddlePaddle (practical ultra lightweight OCR system, support 80+ languages recognition, provi…☆40Updated 8 months ago
- Complex data extraction and orchestration framework designed for processing unstructured documents. It integrates AI-powered document pip…☆77Updated this week
- Private ChatGPT/Perplexity. Securely unlocks knowledge from confidential business information.☆77Updated last year
- Retrieval of fully structured data made easy. Use LLMs or custom models. Specialized on PDFs and HTML files. Extensive support of tabular…☆78Updated last month
- A simple Next.js frontend to explore your local weaviate collections and data☆39Updated 5 months ago
- Taking Normal Text as Input and Generating SQL commands using the OpenAI's GPT-3☆15Updated 5 years ago
- ☆20Updated 10 months ago
- Generate pydantic models from JSON Schema☆23Updated 2 years ago
- Open-source observability for your LLM application.☆53Updated 11 months ago
- ☆199Updated 2 weeks ago
- An open-source cloud-native of large multi-modal models (LMMs) serving framework.☆164Updated 2 years ago
- The faststream-gen library uses advanced AI to generate FastStream code from user descriptions, speeding up FastStream app development.☆48Updated last year
- Excel spreadsheet crawler and table parser for data extraction and querying☆164Updated 9 months ago
- A tool to OCR PDFs using gen-AI models☆45Updated 5 months ago
- simplifies the process of creating and managing LLM workflows.☆112Updated last year
- A set of tools to create synthetically-generated data from documents☆37Updated 3 months ago
- This repository is designed for deploying and managing server processes that handle embeddings using the Infinity Embedding model or Larg…☆26Updated 9 months ago
- A library to extract the main content from html. Developed for information on LLM and for feeding data into LangChain and LlamaIndex.☆51Updated last year
- scraping and querying documents for LLMs☆24Updated 2 months ago
- Data extraction with Donut ML model☆57Updated last year
- A Prodigy plugin for PDF annotation☆36Updated 4 months ago
- Unattended Lightweight Text Classifiers with LLM Embeddings☆186Updated last year
- Use a LlamaIndex Agent as a backend service☆20Updated last year
- A Python-based parallel file chunking system designed for processing large codebases into LLM-friendly chunks.☆46Updated 3 months ago
- LLM prompt language based on Jinja. Banks provides tools and functions to build prompts text and chat messages from generic blueprints. I…☆119Updated 4 months ago
- Embedding models from Jina AI☆65Updated last year
- I will be adding different kind of opensource data extraction tools code using python☆10Updated last year
- Agent that routes to different tools - LLM classifier SDK☆45Updated last year
- A simple MCP ODBC server using FastAPI, ODBC and SQLAlchemy.☆19Updated 6 months ago
- A microframework for creating simple AI agents.☆94Updated last year