py-pdf / benchmarksLinks
Benchmarking PDF libraries
☆315Updated 4 months ago
Alternatives and similar repositories for benchmarks
Users that are interested in benchmarks are comparing it to the libraries listed below
Sorting:
- Python bindings to PDFium, reasonably cross-platform.☆668Updated this week
- Extract structured text from pdfs quickly☆620Updated 5 months ago
- ☆197Updated last week
- Streamlit PDF viewer☆187Updated this week
- A fast, lightweight and easy-to-use Python library for splitting text into semantically meaningful chunks.☆418Updated 2 weeks ago
- Simple package to extract text with coordinates from programmatic PDFs☆213Updated last week
- ⚡️A Blazing-Fast Python Library for Ranking Evaluation, Comparison, and Fusion 🐍☆613Updated 3 months ago
- A python library to define and validate data types in Docling.☆201Updated this week
- UniTable: Towards a Unified Table Foundation Model☆512Updated last year
- Software that makes labeling PDFs easy.☆421Updated last year
- Adobe PDFServices python SDK Samples☆160Updated 3 months ago
- Fast lexical search implementing BM25 in Python using Numpy, Numba and Scipy☆1,384Updated last week
- ☆164Updated 2 weeks ago
- 📚 Process PDFs, Word documents and more with spaCy☆800Updated 8 months ago
- Late Interaction Models Training & Retrieval☆642Updated last week
- Show the differences between two strings/text as a compact text, in markdown/HTML, in the terminal and more.☆147Updated 2 weeks ago
- Fast Semantic Text Deduplication & Filtering☆830Updated 2 weeks ago
- ☆387Updated last year
- Python API for https://vespa.ai, the open big data serving engine☆146Updated this week
- A spaCy wrapper for GliNER☆123Updated 9 months ago
- DocLayNet: A Large Human-Annotated Dataset for Document-Layout Analysis☆395Updated 2 years ago
- Extract docx headers, footers, (formatted) text, footnotes, endnotes, properties, and images.☆193Updated last week
- Lite & Super-fast re-ranking for your search & retrieval pipelines. Supports SoTA Listwise and Pairwise reranking based on LLMs and cro…☆880Updated last month
- OCR Benchmark☆590Updated 3 weeks ago
- PyMuPDF4LLM☆1,113Updated this week
- Multi-threaded matrix multiplication and cosine similarity calculations for dense and sparse matrices. Appropriate for calculating the K …☆83Updated 10 months ago
- Visualize Different Text Splitting Methods☆301Updated 10 months ago
- ☆238Updated 5 months ago
- FastFit ⚡ When LLMs are Unfit Use FastFit ⚡ Fast and Effective Text Classification with Many Classes☆212Updated last month
- library supporting NLP and CV research on scientific papers☆785Updated last year