StatCan / SLICEmyPDFLinks
This project uses SLICE algorithm to extract information from a text-based PDF page containing financial statements (tabular data). It can also be used to extract regular tables but will contain all text on a page.
☆64Updated 3 years ago
Alternatives and similar repositories for SLICEmyPDF
Users that are interested in SLICEmyPDF are comparing it to the libraries listed below
Sorting:
- Google Colab Demo of CascadeTabNet: An approach for end to end table detection and structure recognition from image-based documents☆47Updated 3 years ago
- Python library to extract tabular data from images and scanned PDFs☆277Updated 11 months ago
- Simplifies use of the Dedupe library via Pandas☆136Updated 2 years ago
- my personal receipts collected all over the world☆76Updated 9 months ago
- Extracting Semi-Structured Data from PDFs on a large scale☆52Updated 3 years ago
- Record linking package that fuzzy matches two Python pandas dataframes using sqlite3 fts4☆283Updated 2 years ago
- code for http://www.python4cpas.com/☆36Updated 5 years ago
- 📛 Fuzzy Name Matching with Machine Learning☆264Updated last year
- Multiple and Large PDF Documents Text Extraction.☆129Updated 5 months ago
- A general purpose PDF text-layer redaction tool for Python 2/3.☆197Updated last year
- Example project showing how to host multiple streamlit apps on Heroku behind a nginx proxy with authentication☆80Updated 2 years ago
- A package for interactive visual analysis in jupyter notebooks☆22Updated 2 years ago
- Super Fast String Matching in Python☆370Updated 4 months ago
- ☄️ Parallel and distributed training with spaCy and Ray☆54Updated last year
- Bare bones use-case for deploying a containerized web app (built in streamlit) on AWS.☆92Updated 11 months ago
- Python wrapper for xpdf☆19Updated 5 years ago
- Simplify DOCX files to JSON☆244Updated 9 months ago
- pandas_ui helps you wrangle & explore your data and create custom visualizations without digging through StackOverflow. All inside your J…☆154Updated 3 years ago
- OCR, Archive, Index and Search: Implementation agnostic OCR framework.☆222Updated last year
- BoxDetect is a Python package based on OpenCV which allows you to easily detect rectangular shapes like character or checkbox boxes on sc…☆110Updated 2 years ago
- An open-source XBRL processor for business rules, rendering and custom data reporting. See https://xbrl.us/xule for documentation and htt…☆33Updated 3 weeks ago
- Complex data extraction and orchestration framework designed for processing unstructured documents. It integrates AI-powered document pip…☆70Updated this week
- Scripts and results from our OCR roundup, available on Source☆150Updated 6 years ago
- Deepparse is a state-of-the-art library for parsing multinational street addresses using deep learning☆316Updated 5 months ago
- 📈 The panel-highcharts package makes it easy to use HighCharts in Python, Notebooks and with HoloViz Panel.☆159Updated 2 years ago
- 🧬 A JupyterLab extension for annotating data with Prodigy☆189Updated 2 years ago
- Intuitive interface for fine-tuning and retraining a Tesseract OCR language model☆9Updated 2 weeks ago
- This repository contains an implementation of a US address parser built using spaCy NLP library.☆37Updated last year
- Python interface to Apache PDFBox command-line tools.☆75Updated 2 years ago
- For pyvis and networkx☆85Updated 2 years ago