StatCan / SLICEmyPDFLinks

This project uses SLICE algorithm to extract information from a text-based PDF page containing financial statements (tabular data). It can also be used to extract regular tables but will contain all text on a page.

☆66

Alternatives and similar repositories for SLICEmyPDF

Users that are interested in SLICEmyPDF are comparing it to the libraries listed below

Sorting:

HazyResearch / pdftotree
A tool for converting PDF into hOCR with text, tables, and figures being recognized and preserved.
☆461Updated 2 years ago
LexPredict / openedgar
OpenEDGAR (openedgar.io)
☆321Updated 3 years ago
Christopher-Thornton / hmni
📛 Fuzzy Name Matching with Machine Learning
☆266Updated last year
danshorstein / python4cpas
code for http://www.python4cpas.com/
☆36Updated 6 years ago
manusimidt / py-xbrl
Python-based parser for parsing XBRL and iXBRL files
☆149Updated last week
LSEG-API-Samples / Article.OpenPermID.Python.APIs
Python APIs for Open PermID
☆15Updated 2 years ago
milcent / benford_py
Python implementation of Benford's Law tests.
☆152Updated 3 years ago
Lyonk71 / pandas-dedupe
Simplifies use of the Dedupe library via Pandas
☆136Updated 2 years ago
janedoesrepo / pdfreader
Extracting Semi-Structured Data from PDFs on a large scale
☆52Updated 3 years ago
ExtractTable / ExtractTable-py
Python library to extract tabular data from images and scanned PDFs
☆285Updated last year
Bergvca / string_grouper
Super Fast String Matching in Python
☆371Updated 10 months ago
tgherzog / wbgapi
Python module that makes using the World Bank's API a lot easier and more intuitive.
☆171Updated last year
Pirimid / financial-documents-ocr-deep-learning
☆42Updated 5 years ago
microsoft / Simplify-Docx
Simplify DOCX files to JSON
☆256Updated last year
BrelLibrary / brel
A Python library for reading XBRL reports
☆42Updated this week
adobe / pdfservices-python-sdk-samples
Adobe PDFServices python SDK Samples
☆161Updated 6 months ago
duyet / skill2vec-dataset
Dataset and pre-trained model for Skill2vec
☆84Updated last year
LexPredict / lexpredict-lexnlp
LexNLP by LexPredict
☆762Updated last year
secdatabase / SEC-XBRL-Financial-Statement-Dataset
SECDatabase.com produced this dataset with the text and detailed numeric information of all financial statements. The Dataset is extracte…
☆84Updated 4 years ago
ryansmccoy / py-sec-edgar
Python application used to download, parse, and extract structured/unstructured data from filings in the SEC Edgar Database (including 10…
☆116Updated last week
sachinchaturvedi93 / Company-Name-Standardization
Using Natural Language Processing to standardize Company Names
☆11Updated 4 years ago
ahmedkhemiri95 / PDFs-TextExtract
Multiple and Large PDF Documents Text Extraction.
☆131Updated 11 months ago
Unstructured-IO / pipeline-sec-filings
Preprocessing pipeline notebooks and API supporting text extraction from SEC documents
☆148Updated 2 years ago
Taxuspt / heroku_streamlit_nginx
Example project showing how to host multiple streamlit apps on Heroku behind a nginx proxy with authentication
☆80Updated 3 years ago
awesome-panel / panel-highcharts
📈 The panel-highcharts package makes it easy to use HighCharts in Python, Notebooks and with HoloViz Panel.
☆159Updated 3 years ago
LSEG-API-Samples / Example.RDPLibrary.Python
Example projects demonstrating access to the Refinitiv Data Platform using the Python Library
☆26Updated 10 months ago
joeyism / py-edgar
A small library to access files from SEC's edgar
☆242Updated last year
psolin / cleanco
Company Name Processor written in Python
☆350Updated 3 weeks ago
explosion / spacy-ray
☄️ Parallel and distributed training with spaCy and Ray
☆56Updated 2 years ago
JensWalter / my-receipts
my personal receipts collected all over the world
☆82Updated last year