py-pdf / benchmarks
Benchmarking PDF libraries
☆263Updated last year
Alternatives and similar repositories for benchmarks:
Users that are interested in benchmarks are comparing it to the libraries listed below
- Python bindings to PDFium☆542Updated this week
- Extract structured text from pdfs quickly☆433Updated last week
- Simple package to extract text with coordinates from programmatic PDFs☆77Updated this week
- Streamlit PDF viewer☆132Updated last month
- A python library to define and validate data types in Docling.☆79Updated this week
- A fast, lightweight and easy-to-use Python library for splitting text into semantically meaningful chunks.☆259Updated this week
- ☆174Updated last week
- Running Docling as an API service☆140Updated this week
- A Comprehensive Benchmark for Document Parsing and Evaluation☆277Updated 2 weeks ago
- FastFit ⚡ When LLMs are Unfit Use FastFit ⚡ Fast and Effective Text Classification with Many Classes☆189Updated 5 months ago
- UniTable: Towards a Unified Table Foundation Model☆440Updated 9 months ago
- A spaCy wrapper for GliNER☆108Updated last month
- multimodal document analysis☆163Updated 9 months ago
- Viewer for the structure extracted by Grobid on PDF documents☆46Updated last month
- LLM Comparator is an interactive data visualization tool for evaluating and analyzing LLM responses side-by-side, developed by the PAIR t…☆390Updated last month
- ☆149Updated 3 months ago
- Additional packages (components, document stores and the likes) to extend the capabilities of Haystack version 2.0 and onwards☆136Updated this week
- ☆351Updated last year
- Generalist and Lightweight Model for Text Classification☆90Updated this week
- ⚡️A Blazing-Fast Python Library for Ranking Evaluation, Comparison, and Fusion 🐍☆527Updated 8 months ago
- RAG (Retrieval-Augmented Generation) Chatbot Examples Using PyMuPDF☆810Updated this week
- Multi-threaded matrix multiplication and cosine similarity calculations for dense and sparse matrices. Appropriate for calculating the K …☆79Updated 2 months ago
- Fast Semantic Text Deduplication☆567Updated last week
- DocLayNet: A Large Human-Annotated Dataset for Document-Layout Analysis☆323Updated 2 years ago
- A Python library to chunk/group your texts based on semantic similarity.☆94Updated 8 months ago
- Extract docx headers, footers, (formatted) text, footnotes, endnotes, properties, and images.☆177Updated this week
- ☆81Updated this week
- Software that makes labeling PDFs easy.☆407Updated 10 months ago
- A Docker-powered service for PDF document layout analysis. This service provides a powerful and flexible PDF analysis service. The servic…☆270Updated this week
- ☆60Updated 4 months ago