StatCan / SLICEmyPDFLinks
This project uses SLICE algorithm to extract information from a text-based PDF page containing financial statements (tabular data). It can also be used to extract regular tables but will contain all text on a page.
☆66Updated 4 years ago
Alternatives and similar repositories for SLICEmyPDF
Users that are interested in SLICEmyPDF are comparing it to the libraries listed below
Sorting:
- A tool for converting PDF into hOCR with text, tables, and figures being recognized and preserved.☆461Updated 2 years ago
- OpenEDGAR (openedgar.io)☆321Updated 3 years ago
- 📛 Fuzzy Name Matching with Machine Learning☆266Updated last year
- code for http://www.python4cpas.com/☆36Updated 6 years ago
- Python-based parser for parsing XBRL and iXBRL files☆149Updated last week
- Python APIs for Open PermID☆15Updated 2 years ago
- Python implementation of Benford's Law tests.☆152Updated 3 years ago
- Simplifies use of the Dedupe library via Pandas☆136Updated 2 years ago
- Extracting Semi-Structured Data from PDFs on a large scale☆52Updated 3 years ago
- Python library to extract tabular data from images and scanned PDFs☆285Updated last year
- Super Fast String Matching in Python☆371Updated 10 months ago
- Python module that makes using the World Bank's API a lot easier and more intuitive.☆171Updated last year
- ☆42Updated 5 years ago
- Simplify DOCX files to JSON☆256Updated last year
- A Python library for reading XBRL reports☆42Updated this week
- Adobe PDFServices python SDK Samples☆161Updated 6 months ago
- Dataset and pre-trained model for Skill2vec☆84Updated last year
- LexNLP by LexPredict☆762Updated last year
- SECDatabase.com produced this dataset with the text and detailed numeric information of all financial statements. The Dataset is extracte…☆84Updated 4 years ago
- Python application used to download, parse, and extract structured/unstructured data from filings in the SEC Edgar Database (including 10…☆116Updated last week
- Using Natural Language Processing to standardize Company Names☆11Updated 4 years ago
- Multiple and Large PDF Documents Text Extraction.☆131Updated 11 months ago
- Preprocessing pipeline notebooks and API supporting text extraction from SEC documents☆148Updated 2 years ago
- Example project showing how to host multiple streamlit apps on Heroku behind a nginx proxy with authentication☆80Updated 3 years ago
- 📈 The panel-highcharts package makes it easy to use HighCharts in Python, Notebooks and with HoloViz Panel.☆159Updated 3 years ago
- Example projects demonstrating access to the Refinitiv Data Platform using the Python Library☆26Updated 10 months ago
- A small library to access files from SEC's edgar☆242Updated last year
- Company Name Processor written in Python☆350Updated 3 weeks ago
- ☄️ Parallel and distributed training with spaCy and Ray☆56Updated 2 years ago
- my personal receipts collected all over the world☆82Updated last year