py-pdf / benchmarksLinks
Benchmarking PDF libraries
☆321Updated 7 months ago
Alternatives and similar repositories for benchmarks
Users that are interested in benchmarks are comparing it to the libraries listed below
Sorting:
- Python bindings to PDFium, reasonably cross-platform.☆719Updated last week
- ☆201Updated last week
- Streamlit PDF viewer☆195Updated last week
- Simple package to extract text with coordinates from programmatic PDFs☆238Updated this week
- Extract structured text from pdfs quickly☆661Updated 7 months ago
- Docling core data types and transformations☆225Updated last week
- A fast, lightweight and easy-to-use Python library for splitting text into semantically meaningful chunks.☆552Updated 3 months ago
- ☆392Updated 2 years ago
- UniTable: Towards a Unified Table Foundation Model☆521Updated last year
- ⚡️A Blazing-Fast Python Library for Ranking Evaluation, Comparison, and Fusion 🐍☆642Updated 6 months ago
- 📚 Process PDFs, Word documents and more with spaCy☆847Updated 11 months ago
- ☆185Updated 2 weeks ago
- Show the differences between two strings/text as a compact text, in markdown/HTML, in the terminal and more.☆152Updated 3 weeks ago
- FastFit ⚡ When LLMs are Unfit Use FastFit ⚡ Fast and Effective Text Classification with Many Classes☆213Updated 4 months ago
- OCR Benchmark☆613Updated 3 months ago
- Software that makes labeling PDFs easy.☆426Updated last year
- Adobe PDFServices python SDK Samples☆161Updated 6 months ago
- Pinecone text client library☆67Updated 5 months ago
- Fast Multimodal Semantic Deduplication & Filtering☆882Updated 2 weeks ago
- Lite & Super-fast re-ranking for your search & retrieval pipelines. Supports SoTA Listwise and Pairwise reranking based on LLMs and cro…☆935Updated last month
- Benchmark various LLM Structured Output frameworks: Instructor, Mirascope, Langchain, LlamaIndex, Fructose, Marvin, Outlines, etc on task…☆184Updated last year
- DocLLM: A layout-aware generative language model for multimodal document understanding☆137Updated 2 years ago
- Fast lexical search implementing BM25 in Python using Numpy, Numba and Scipy☆1,477Updated this week
- A Python library to chunk/group your texts based on semantic similarity.☆103Updated last year
- Viewer for the structure extracted by Grobid on PDF documents☆57Updated 3 months ago
- DocLayNet: A Large Human-Annotated Dataset for Document-Layout Analysis☆411Updated 3 years ago
- Late Interaction Models Training & Retrieval☆694Updated last month
- Multi-threaded matrix multiplication and cosine similarity calculations for dense and sparse matrices. Appropriate for calculating the K …☆86Updated last year
- A Python vector database you just need - no more, no less.☆642Updated last year
- Repository for deepdoctection tutorial notebooks☆50Updated last month