py-pdf / benchmarksLinks
Benchmarking PDF libraries
☆312Updated 3 months ago
Alternatives and similar repositories for benchmarks
Users that are interested in benchmarks are comparing it to the libraries listed below
Sorting:
- Python bindings to PDFium, reasonably cross-platform.☆647Updated this week
- ☆194Updated 3 weeks ago
- A fast, lightweight and easy-to-use Python library for splitting text into semantically meaningful chunks.☆375Updated last month
- Simple package to extract text with coordinates from programmatic PDFs☆197Updated 2 weeks ago
- Extract structured text from pdfs quickly☆605Updated 3 months ago
- Streamlit PDF viewer☆177Updated 2 weeks ago
- A python library to define and validate data types in Docling.☆185Updated this week
- ☆147Updated last month
- Late Interaction Models Training & Retrieval☆608Updated last week
- Adobe PDFServices python SDK Samples☆158Updated 2 months ago
- 📚 Process PDFs, Word documents and more with spaCy☆761Updated 6 months ago
- Extract docx headers, footers, (formatted) text, footnotes, endnotes, properties, and images.☆192Updated last week
- Show the differences between two strings/text as a compact text, in markdown/HTML, in the terminal and more.☆140Updated 3 months ago
- ⚡️A Blazing-Fast Python Library for Ranking Evaluation, Comparison, and Fusion 🐍☆594Updated last month
- A Python library to chunk/group your texts based on semantic similarity.☆97Updated last year
- Viewer for the structure extracted by Grobid on PDF documents☆54Updated 2 weeks ago
- Fast Semantic Text Deduplication & Filtering☆810Updated 3 weeks ago
- FastFit ⚡ When LLMs are Unfit Use FastFit ⚡ Fast and Effective Text Classification with Many Classes☆212Updated 2 weeks ago
- ☆383Updated last year
- Fast lexical search implementing BM25 in Python using Numpy, Numba and Scipy☆1,335Updated 3 weeks ago
- Software that makes labeling PDFs easy.☆420Updated last year
- Excel spreadsheet crawler and table parser for data extraction and querying☆157Updated 7 months ago
- Repository for deepdoctection tutorial notebooks☆46Updated 3 months ago
- Benchmark various LLM Structured Output frameworks: Instructor, Mirascope, Langchain, LlamaIndex, Fructose, Marvin, Outlines, etc on task…☆179Updated last year
- DocLLM: A layout-aware generative language model for multimodal document understanding☆129Updated last year
- Visualize Different Text Splitting Methods☆288Updated 9 months ago
- A very simple news crawler with a funny name☆403Updated this week
- Multi-threaded matrix multiplication and cosine similarity calculations for dense and sparse matrices. Appropriate for calculating the K …☆83Updated 9 months ago
- Pinecone text client library☆65Updated last month
- Python API for https://vespa.ai, the open big data serving engine☆143Updated last week