py-pdf / benchmarks
Benchmarking PDF libraries
☆271Updated last year
Alternatives and similar repositories for benchmarks:
Users that are interested in benchmarks are comparing it to the libraries listed below
- Python bindings to PDFium☆560Updated this week
- Extract structured text from pdfs quickly☆469Updated last month
- Streamlit PDF viewer☆143Updated this week
- A Python library to chunk/group your texts based on semantic similarity.☆95Updated 9 months ago
- Simple package to extract text with coordinates from programmatic PDFs☆109Updated 2 weeks ago
- A python library to define and validate data types in Docling.☆120Updated last week
- ☆177Updated last week
- A fast, lightweight and easy-to-use Python library for splitting text into semantically meaningful chunks.☆290Updated 3 weeks ago
- ⚡️ 80x faster Fasttext language detection out of the box | Split text by language☆189Updated 3 weeks ago
- Software that makes labeling PDFs easy.☆410Updated 11 months ago
- RAG (Retrieval-Augmented Generation) Chatbot Examples Using PyMuPDF☆877Updated 2 weeks ago
- Viewer for the structure extracted by Grobid on PDF documents☆48Updated 2 months ago
- Split text into semantic chunks, up to a desired chunk size. Supports calculating length by characters and tokens, and is callable from R…☆400Updated this week
- Excel spreadsheet crawler and table parser for data extraction and querying☆132Updated last month
- Adobe PDFServices python SDK Samples☆148Updated 5 months ago
- Extract docx headers, footers, (formatted) text, footnotes, endnotes, properties, and images.☆179Updated this week
- ⚡️A Blazing-Fast Python Library for Ranking Evaluation, Comparison, and Fusion 🐍☆542Updated 9 months ago
- FastFit ⚡ When LLMs are Unfit Use FastFit ⚡ Fast and Effective Text Classification with Many Classes☆197Updated 6 months ago
- Fast Semantic Text Deduplication☆638Updated this week
- Multi-threaded matrix multiplication and cosine similarity calculations for dense and sparse matrices. Appropriate for calculating the K …☆80Updated 3 months ago
- UniTable: Towards a Unified Table Foundation Model☆461Updated 10 months ago
- Lightweight, performant, deep table extraction☆453Updated 3 weeks ago
- A Python Search Engine for Humans 🥸☆216Updated last year
- Benchmark various LLM Structured Output frameworks: Instructor, Mirascope, Langchain, LlamaIndex, Fructose, Marvin, Outlines, etc on task…☆163Updated 7 months ago
- OCR Benchmark☆464Updated last week
- ☆105Updated last week
- Demos, examples and utilities using PyMuPDF☆651Updated 9 months ago
- Repository for deepdoctection tutorial notebooks☆44Updated 4 months ago
- Clean, filter and sample URLs to optimize data collection – Python & command-line – Deduplication, spam, content and language filters☆137Updated 3 months ago
- A Python client for the Unstructured Platform API☆98Updated last week