VikParuchuri / pdftext
Extract structured text from pdfs quickly
☆393Updated this week
Alternatives and similar repositories for pdftext:
Users that are interested in pdftext are comparing it to the libraries listed below
- Lightweight, performant, deep table extraction☆393Updated last month
- A fast, lightweight and easy-to-use Python library for splitting text into semantically meaningful chunks.☆234Updated last week
- Fast Semantic Text Deduplication☆472Updated this week
- Recipes for learning, fine-tuning, and adapting ColPali to your multimodal RAG use cases. 👨🏻🍳☆251Updated last month
- TF-ID: Table/Figure IDentifier for academic papers☆228Updated 6 months ago
- A Comprehensive Benchmark for Document Parsing and Evaluation☆211Updated last week
- UniTable: Towards a Unified Table Foundation Model☆413Updated 7 months ago
- ☆170Updated last week
- Python bindings to PDFium☆493Updated this week
- A lightweight, low-dependency, unified API to use all common reranking and cross-encoder models.☆1,261Updated last week
- ☆207Updated 6 months ago
- Use late-interaction multi-modal models such as ColPali in just a few lines of code.☆705Updated this week
- RAG (Retrieval-Augmented Generation) Chatbot Examples Using PyMuPDF☆728Updated 2 months ago
- Structured information extraction from documents☆299Updated 4 months ago
- High-performance retrieval engine for unstructured data☆1,128Updated 2 weeks ago
- Colivara is a suite of services that allows you to store, search, and retrieve documents based on their visual embedding. ColiVara has st…☆220Updated 2 weeks ago
- Math OCR model that outputs LaTeX and markdown☆1,006Updated 2 months ago
- Split text into semantic chunks, up to a desired chunk size. Supports calculating length by characters and tokens, and is callable from R…☆319Updated this week
- Vision-Augmented Retrieval and Generation (VARAG) - Vision first RAG Engine☆426Updated 2 weeks ago
- Code for explaining and evaluating late chunking (chunked pooling)☆314Updated last month
- Detect and extract tables to markdown and csv☆723Updated this week
- ☆201Updated last month
- Framework for enhancing LLMs for RAG tasks using fine-tuning.☆523Updated last month
- Solving data for LLMs - Create quality synthetic datasets!☆144Updated last week