tabulapdf / tabula
Tabula is a tool for liberating data tables trapped inside PDF files
☆7,002Updated last month
Alternatives and similar repositories for tabula:
Users that are interested in tabula are comparing it to the libraries listed below
- Simple wrapper of tabula-java: extract table from PDF into pandas DataFrame☆2,248Updated 4 months ago
- Camelot: PDF Table Extraction for Humans☆3,681Updated 2 years ago
- A web interface to extract tabular data from PDFs☆1,651Updated 3 months ago
- A Python library to extract tabular data from PDFs☆3,262Updated this week
- A set of tools for extracting tables from PDF files helping to do data mining on (OCR-processed) scanned documents.☆2,237Updated 2 years ago
- An open source multi-tool for exploring and publishing data☆9,966Updated this week
- A suite of utilities for converting to and working with CSV, the king of tabular file formats.☆6,157Updated last week
- Plumb a PDF for detailed information about each char, rectangle, line, et cetera — and easily extract text and tables.☆7,590Updated 3 weeks ago
- Python PDF Parser (Not actively maintained). Check out pdfminer.six.☆5,288Updated 2 years ago
- A terminal spreadsheet multitool for discovering and arranging data☆8,174Updated 2 weeks ago
- Convert CSV files into a SQLite database☆899Updated 2 weeks ago
- Converts a pdf file into a text file while keeping the layout of the original pdf. Useful to extract the content from a table in a pdf fi…☆1,589Updated last year
- A fast and friendly PDF scraping library.☆777Updated last year
- pdfrw is a pure Python library that reads and writes PDFs☆1,889Updated 11 months ago
- Plotting library for IPython/Jupyter notebooks☆3,653Updated 2 months ago
- Out-of-Core hybrid Apache Arrow/NumPy DataFrame for Python, ML, visualization and exploration of big tabular data at a billion rows per s…☆8,371Updated 6 months ago
- Python-based tools for document analysis and OCR☆3,450Updated 3 years ago
- Declarative visualization library for Python☆9,724Updated this week
- Open-source JavaScript charting library behind Plotly and Dash☆17,509Updated last week
- A visualization grammar.☆11,461Updated this week
- A post-processing tool for scanned sheets of paper.☆1,071Updated 9 months ago
- OpenRefine is a free, open source power tool for working with messy data and improving it☆11,287Updated this week
- A python module that wraps the pdftoppm utility to convert PDF to PIL Image object☆1,768Updated 9 months ago
- A next-generation curated knowledge sharing platform for data scientists and other technical professions.☆5,513Updated 7 months ago
- A Gtk/Qt front-end to tesseract-ocr.☆1,739Updated this week
- borb is a library for reading, creating and manipulating PDF files in python.☆3,467Updated 4 months ago
- Parallel computing with task scheduling☆13,136Updated last week
- A fast CSV command line toolkit written in Rust.☆10,665Updated this week
- PyMuPDF is a high performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.☆7,016Updated this week
- Extract text, metadata and references (pdf, url, doi, arxiv) from PDF. Optionally download all referenced PDFs.☆1,052Updated last year