VikParuchuri / pdftextLinks
Extract structured text from pdfs quickly
☆485Updated this week
Alternatives and similar repositories for pdftext
Users that are interested in pdftext are comparing it to the libraries listed below
Sorting:
- UniTable: Towards a Unified Table Foundation Model☆473Updated last year
- Detect and extract tables to markdown and csv☆747Updated 4 months ago
- Lightweight, performant, deep table extraction☆466Updated this week
- A fast, lightweight and easy-to-use Python library for splitting text into semantically meaningful chunks.☆315Updated 2 months ago
- Use late-interaction multi-modal models such as ColPali in just a few lines of code.☆789Updated 4 months ago
- Recipes for learning, fine-tuning, and adapting ColPali to your multimodal RAG use cases. 👨🏻🍳☆292Updated this week
- ☆183Updated this week
- OCR Benchmark☆495Updated last week
- TF-ID: Table/Figure IDentifier for academic papers☆235Updated 10 months ago
- A lightweight, low-dependency, unified API to use all common reranking and cross-encoder models.☆1,436Updated last week
- Simple package to extract text with coordinates from programmatic PDFs☆126Updated last month
- ☆225Updated 6 months ago
- Structured information extraction from documents☆315Updated 8 months ago
- ExtractThinker is a Document Intelligence library for LLMs, offering ORM-style interaction for flexible and powerful document workflows.☆1,265Updated last week
- Fast Semantic Text Deduplication & Filtering☆697Updated last week
- Python bindings to PDFium☆578Updated last week
- 🥤 RAGLite is a Python toolkit for Retrieval-Augmented Generation (RAG) with DuckDB or PostgreSQL☆992Updated this week
- RAG (Retrieval-Augmented Generation) Chatbot Examples Using PyMuPDF☆937Updated 3 weeks ago
- Vision-Augmented Retrieval and Generation (VARAG) - Vision first RAG Engine☆461Updated 4 months ago
- Fast lexical search implementing BM25 in Python using Numpy, Numba and Scipy☆1,193Updated this week
- ☆122Updated this week
- A python library to define and validate data types in Docling.☆137Updated last week
- Code for explaining and evaluating late chunking (chunked pooling)☆396Updated 5 months ago
- FastAPI wrapper around DSPy☆243Updated last year
- Things you can do with the token embeddings of an LLM☆1,443Updated 2 months ago
- Split text into semantic chunks, up to a desired chunk size. Supports calculating length by characters and tokens, and is callable from R…☆430Updated this week
- ⚡️ 80x faster Fasttext language detection out of the box | Split text by language☆205Updated 2 months ago
- clean & curate your data with LLMs.☆492Updated 11 months ago
- Lite & Super-fast re-ranking for your search & retrieval pipelines. Supports SoTA Listwise and Pairwise reranking based on LLMs and cro…☆809Updated 6 months ago
- Developer APIs to Accelerate LLM Projects☆1,675Updated 7 months ago