pdf2htmlEX / pdf2htmlEXLinks
Convert PDF to HTML without losing text or format.
☆5,007Updated 11 months ago
Alternatives and similar repositories for pdf2htmlEX
Users that are interested in pdf2htmlEX are comparing it to the libraries listed below
Sorting:
- PyMuPDF is a high performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.☆7,417Updated this week
- Plumb a PDF for detailed information about each char, rectangle, line, et cetera — and easily extract text and tables.☆7,883Updated last week
- Toolkit for linearizing PDFs for LLM datasets/training☆12,940Updated this week
- Community maintained fork of pdfminer - we fathom PDF☆6,533Updated last month
- Open source Python library for converting PDF to DOCX.☆2,983Updated 3 weeks ago
- Convert PDF to HTML without losing text or format.☆10,494Updated 2 years ago
- Official code implementation of General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model☆7,656Updated 4 months ago
- Official Implementation of OCR-free Document Understanding Transformer (Donut) and Synthetic Document Generator (SynthDoG), ECCV 2022☆6,339Updated 11 months ago
- PDF craft can convert PDF files into various other formats. This project will focus on processing PDF files of scanned books.☆2,914Updated this week
- OCR, layout analysis, reading order, table recognition in 90+ languages☆17,676Updated this week
- Download all your kindle books script.☆2,690Updated 4 months ago
- A Python library to extract tabular data from PDFs☆3,332Updated this week
- RAG Web UI is an intelligent dialogue system based on RAG (Retrieval-Augmented Generation) technology.☆2,469Updated last month
- OCR & Document Extraction using vision models☆11,350Updated last month
- A Comprehensive Toolkit for High-Quality PDF Content Extraction☆7,919Updated 5 months ago
- Document (PDF, Word, PPTX ...) extraction and parse API using state of the art modern OCRs + Ollama supported models. Anonymize documents…☆2,614Updated last month
- mupdf mirror☆2,165Updated this week
- qpdf: A content-preserving PDF document transformer☆4,057Updated last week
- Convert PDF to markdown + JSON quickly with high accuracy☆25,975Updated this week
- The free and privacy-friendly screen recorder with no limits 🎥☆14,806Updated last month
- The easiest tool for fine-tuning LLM models, synthetic data generation, and collaborating on datasets.☆3,782Updated this week
- PDFium - Project to compile PDFium library to multiple platforms.☆983Updated this week
- A Docker-powered service for PDF document layout analysis. This service provides a powerful and flexible PDF analysis service. The servic…☆612Updated 2 weeks ago
- Make bilingual epub books Using AI translate☆8,482Updated last month
- Implementation of Nougat Neural Optical Understanding for Academic Documents☆9,493Updated 4 months ago
- A git prepare-commit-msg hook for authoring commit messages with LLMs.☆2,380Updated 2 weeks ago
- Demos, examples and utilities using PyMuPDF☆665Updated 11 months ago
- A python module that wraps the pdftoppm utility to convert PDF to PIL Image object☆1,813Updated 11 months ago
- A PDF to Markdown converter☆1,386Updated last year
- Yet Another Document Translator☆4,223Updated this week