Unstructured-IO / pipeline-paddleocrLinks
Pipeline for converting PDFs to raw text with PaddleOCR
☆23Updated last year
Alternatives and similar repositories for pipeline-paddleocr
Users that are interested in pipeline-paddleocr are comparing it to the libraries listed below
Sorting:
- Awesome multilingual OCR toolkits based on PaddlePaddle (practical ultra lightweight OCR system, support 80+ languages recognition, provi…☆38Updated 3 months ago
- GLiNER model in a FastAPI microservice.☆44Updated 6 months ago
- Query Expension for Better Query Embedding using LLMs☆52Updated 4 months ago
- A Python pipeline tool and plugin ecosystem for processing technical documents. Process papers from arXiv, SemanticScholar, PDF, with GRO…☆51Updated 3 months ago
- ☆22Updated last year
- This repo is for handling Question Answering, especially for Multi-hop Question Answering☆67Updated last year
- Deployment a light and full OpenAI API for production with vLLM to support /v1/embeddings with all embeddings models.☆42Updated 11 months ago
- a series of tutorials implementing rag service with BentoML and LlamaIndex☆43Updated 6 months ago
- Retrieval of fully structured data made easy. Use LLMs or custom models. Specialized on PDFs and HTML files. Extensive support of tabular…☆71Updated last month
- ☆61Updated last year
- High level library for batched embeddings generation, blazingly-fast web-based RAG and quantized indexes processing ⚡☆66Updated 7 months ago
- An integration of Qdrant ANN vector database backend with txtai☆24Updated 10 months ago
- Efficient few-shot learning with cross-encoders.☆53Updated last year
- Data extraction with Donut ML model☆57Updated 10 months ago
- ☆19Updated 4 months ago
- Complex data extraction and orchestration framework designed for processing unstructured documents. It integrates AI-powered document pip…☆70Updated this week
- Explore a curated collection of exceptional open-source libraries for generative AI meticulously reviewed or slated for review by The AI …☆57Updated last year
- Keyword Extraction and Analysis Pipeline & Application with KeyBERT and Taipy☆17Updated 2 years ago
- ☆187Updated last week
- Elasticsearch integration into LangChain☆57Updated 4 months ago
- DocLLM: A layout-aware generative language model for multimodal document understanding☆126Updated last year
- Code for evaluating with Flow-Judge-v0.1 - an open-source, lightweight (3.8B) language model optimized for LLM system evaluations. Crafte…☆71Updated 7 months ago
- A Python library to chunk/group your texts based on semantic similarity.☆96Updated 11 months ago
- Open-source observability for your LLM application.☆53Updated 5 months ago
- This repository contains the code for implementation of RAG approach with company policies data, evaluation of RAG solution and smart chu…☆13Updated last year
- ☆31Updated last month
- A flexible and easy to use tool for Semantic Routing☆18Updated 9 months ago
- Repository for deepdoctection tutorial notebooks☆45Updated last week
- Benchmark study on LanceDB, an embedded vector DB, for full-text search and vector search☆26Updated last year
- Application configuration and scripts for search on https://docs.vespa.ai/☆12Updated last week