py-pdf / benchmarksLinks
Benchmarking PDF libraries
☆302Updated last month
Alternatives and similar repositories for benchmarks
Users that are interested in benchmarks are comparing it to the libraries listed below
Sorting:
- Python bindings to PDFium, reasonably cross-platform.☆599Updated this week
- ☆189Updated last month
- Extract structured text from pdfs quickly☆516Updated last month
- A python library to define and validate data types in Docling.☆160Updated this week
- Simple package to extract text with coordinates from programmatic PDFs☆153Updated 3 weeks ago
- A fast, lightweight and easy-to-use Python library for splitting text into semantically meaningful chunks.☆347Updated last month
- Extract docx headers, footers, (formatted) text, footnotes, endnotes, properties, and images.☆186Updated last week
- Streamlit PDF viewer☆163Updated last month
- 📚 Process PDFs, Word documents and more with spaCy☆686Updated 4 months ago
- ☆132Updated last week
- Show the differences between two strings/text as a compact text, in markdown/HTML, in the terminal and more.☆136Updated last month
- Adobe PDFServices python SDK Samples☆154Updated 2 weeks ago
- Repository for deepdoctection tutorial notebooks☆46Updated last month
- ⚡️A Blazing-Fast Python Library for Ranking Evaluation, Comparison, and Fusion 🐍☆579Updated last week
- FastFit ⚡ When LLMs are Unfit Use FastFit ⚡ Fast and Effective Text Classification with Many Classes☆209Updated 2 months ago
- UniTable: Towards a Unified Table Foundation Model☆487Updated last year
- DocLLM: A layout-aware generative language model for multimodal document understanding☆127Updated last year
- Viewer for the structure extracted by Grobid on PDF documents☆52Updated 2 months ago
- Benchmark various LLM Structured Output frameworks: Instructor, Mirascope, Langchain, LlamaIndex, Fructose, Marvin, Outlines, etc on task…☆173Updated 10 months ago
- Generalist and Lightweight Model for Text Classification☆148Updated last month
- A Python Search Engine for Humans 🥸☆226Updated last year
- Software that makes labeling PDFs easy.☆416Updated last year
- Late Interaction Models Training & Retrieval☆511Updated 2 weeks ago
- ☆372Updated last year
- Python API for https://vespa.ai, the open big data serving engine☆131Updated last week
- A Python library to chunk/group your texts based on semantic similarity.☆97Updated last year
- Pinecone text client library☆65Updated 4 months ago
- multimodal document analysis☆165Updated last year
- DocLayNet: A Large Human-Annotated Dataset for Document-Layout Analysis☆360Updated 2 years ago
- LitePali is a minimal, efficient implementation of ColPali for image retrieval and indexing, optimized for cloud deployment.☆52Updated 9 months ago