py-pdf / benchmarks
Benchmarking PDF libraries
☆204Updated 10 months ago
Related projects: ⓘ
- Python bindings to PDFium☆349Updated this week
- RAG (Retrieval-Augmented Generation) Chatbot Examples Using PyMuPDF☆282Updated this week
- FastFit ⚡ When LLMs are Unfit Use FastFit ⚡ Fast and Effective Text Classification with Many Classes☆180Updated 3 weeks ago
- ☆316Updated 8 months ago
- A fast and lightweight pure Python library for splitting text into semantically meaningful chunks.☆142Updated 2 months ago
- Extract structured text from pdfs quickly☆292Updated 3 weeks ago
- Easily embed, cluster and semantically label text datasets☆434Updated 5 months ago
- A Python library to chunk/group your texts based on semantic similarity.☆77Updated 2 months ago
- ☆150Updated this week
- Clean, filter and sample URLs to optimize data collection – Python & command-line – Deduplication, spam, content and language filters☆113Updated 2 weeks ago
- Software that makes labeling PDFs easy.☆383Updated 4 months ago
- UniTable: Towards a Unified Table Foundation Model☆340Updated 3 months ago
- DocLLM: A layout-aware generative language model for multimodal document understanding☆109Updated 8 months ago
- Fast lexical search library implementing BM25 in Python using Numpy and Scipy☆770Updated this week
- Multi-threaded matrix multiplication and cosine similarity calculations for dense and sparse matrices. Appropriate for calculating the K …☆73Updated 2 months ago
- A Docker-powered service for PDF document layout analysis. This service provides a powerful and flexible PDF analysis service. The servic…☆100Updated this week
- Adobe PDFServices python SDK Samples☆125Updated 3 months ago
- ⚡️A Blazing-Fast Python Library for Ranking Evaluation, Comparison, and Fusion 🐍☆439Updated 2 months ago
- DocLayNet: A Large Human-Annotated Dataset for Document-Layout Analysis☆235Updated last year
- RankLLM is a Python toolkit for reproducible information retrieval research using rerankers, with a focus on listwise reranking.☆298Updated last week
- Demos, examples and utilities using PyMuPDF☆548Updated 2 months ago
- Benchmark various LLM Structured Output frameworks: Instructor, Mirascope, Langchain, LlamaIndex, Fructose, Marvin, Outlines, etc on task…☆117Updated 3 weeks ago
- Object Detection Model for Scanned Documents☆77Updated 11 months ago
- 80x faster and 95% accurate language identification with Fasttext☆131Updated 7 months ago
- SpanMarker for Named Entity Recognition☆384Updated last month
- Logical structure analysis for visually structured documents☆80Updated 2 years ago
- ☆323Updated 9 months ago
- Fine-Tuning Embedding for RAG with Synthetic Data☆456Updated last year
- Neural Search☆333Updated 3 months ago
- Simply, faster, sentence-transformers☆127Updated 3 weeks ago