StatCan / SLICEmyPDFLinks
This project uses SLICE algorithm to extract information from a text-based PDF page containing financial statements (tabular data). It can also be used to extract regular tables but will contain all text on a page.
β66Updated 4 years ago
Alternatives and similar repositories for SLICEmyPDF
Users that are interested in SLICEmyPDF are comparing it to the libraries listed below
Sorting:
- A tool for converting PDF into hOCR with text, tables, and figures being recognized and preserved.β461Updated 2 years ago
- π Fuzzy Name Matching with Machine Learningβ266Updated last year
- OpenEDGAR (openedgar.io)β321Updated 3 years ago
- Python library to extract tabular data from images and scanned PDFsβ285Updated last year
- Super Fast String Matching in Pythonβ371Updated 10 months ago
- Python-based parser for parsing XBRL and iXBRL filesβ149Updated last week
- Record linking package that fuzzy matches two Python pandas dataframes using sqlite3 fts4β286Updated 3 years ago
- Name matching is a Python package for the matching of company names. This package has been developed to match the names of companies fromβ¦β161Updated 2 months ago
- An open-source XBRL processor for business rules, rendering and custom data reporting. See https://xbrl.us/xule for documentation and httβ¦β35Updated last month
- Company Name Processor written in Pythonβ350Updated 3 weeks ago
- OCR, Archive, Index and Search: Implementation agnostic OCR framework.β224Updated 2 years ago
- Python APIs for Open PermIDβ15Updated 2 years ago
- Parsing pdf tables using YOLOV3β121Updated 4 years ago
- β42Updated 5 years ago
- Simple PDF text extractionβ985Updated 11 months ago
- Python implementation of Benford's Law tests.β152Updated 3 years ago
- Simple example of using a Naive Bayesian classification to classify entries in bank statementsβ158Updated 2 years ago
- A fast and friendly PDF scraping library.β783Updated 2 years ago
- Python application used to download, parse, and extract structured/unstructured data from filings in the SEC Edgar Database (including 10β¦β116Updated last week
- Adobe PDFServices python SDK Samplesβ161Updated 6 months ago
- Custom recipe and utilities for document processingβ200Updated 3 years ago
- Demos, examples and utilities using PyMuPDFβ707Updated last month
- Simplifies use of the Dedupe library via Pandasβ136Updated 2 years ago
- Extracting Semi-Structured Data from PDFs on a large scaleβ52Updated 3 years ago
- my personal receipts collected all over the worldβ82Updated last year
- Google Colab Demo of CascadeTabNet: An approach for end to end table detection and structure recognition from image-based documentsβ47Updated 4 years ago
- A small library to access files from SEC's edgarβ242Updated last year
- code for http://www.python4cpas.com/β36Updated 6 years ago
- demo using FuzzyWuzzy matching company namesβ76Updated 3 years ago
- π Semantic search for headlines and story textβ359Updated 2 years ago