Unstructured-IO / pipeline-paddleocr
Pipeline for converting PDFs to raw text with PaddleOCR
☆21Updated last year
Alternatives and similar repositories for pipeline-paddleocr:
Users that are interested in pipeline-paddleocr are comparing it to the libraries listed below
- ☆22Updated last year
- Awesome multilingual OCR toolkits based on PaddlePaddle (practical ultra lightweight OCR system, support 80+ languages recognition, provi…☆34Updated 2 weeks ago
- A library to extract the main content from html. Developed for information on LLM and for feeding data into LangChain and LlamaIndex.☆37Updated 10 months ago
- Open-source observability for your LLM application.☆51Updated 3 months ago
- Data extraction with Donut ML model☆57Updated 7 months ago
- DocLLM: A layout-aware generative language model for multimodal document understanding☆123Updated last year
- RAGLight is a lightweight and modular Python library for implementing Retrieval-Augmented Generation (RAG), Agentic RAG and RAT (Retrieva…☆24Updated last week
- Repository for deepdoctection tutorial notebooks☆43Updated 4 months ago
- LitePali is a minimal, efficient implementation of ColPali for image retrieval and indexing, optimized for cloud deployment.☆46Updated 5 months ago
- Complex data extraction and orchestration framework designed for processing unstructured documents. It integrates AI-powered document pip…☆67Updated this week
- Elasticsearch integration into LangChain☆56Updated last month
- Deployment a light and full OpenAI API for production with vLLM to support /v1/embeddings with all embeddings models.☆41Updated 8 months ago
- Experimental Code for StructuredRAG: JSON Response Formatting with Large Language Models☆105Updated 3 months ago
- ☆49Updated 9 months ago
- ☆176Updated 2 weeks ago
- python package to parse pdfs with different parsers☆35Updated 3 months ago
- Efficient few-shot learning with cross-encoders.☆50Updated last year
- ☆120Updated last month
- The easiest and most comprehensive framework for building enterprise-grade NL2SQL solutions at scale.☆38Updated 3 months ago
- This repo is for handling Question Answering, especially for Multi-hop Question Answering☆67Updated last year
- This repository presents the original implementation of LumberChunker: Long-Form Narrative Document Segmentation by André V. Duarte, João…☆61Updated 6 months ago
- A Unified Toolkit for Deep Learning-Based Table Extraction☆33Updated 4 months ago
- A python library to define and validate data types in Docling.☆96Updated this week
- Web Interface for Vision Language Models Including InternVLM2☆20Updated 8 months ago
- ☆45Updated last year
- An integration of Qdrant ANN vector database backend with txtai☆24Updated 7 months ago
- Embedding models from Jina AI☆58Updated last year
- Ready-to-go containerized RAG service. Implemented with text-embedding-inference + Qdrant/LanceDB.☆63Updated 3 months ago
- High level library for batched embeddings generation, blazingly-fast web-based RAG and quantized indexes processing ⚡☆67Updated 5 months ago
- Nougat is a Meta AI's revolutionary OCR model designed to transcribe scientific PDFs into an easy-to-use Markdown format.☆22Updated last year