jstray / deepform
Using ML to extract campaign finance data from messy forms for journalism
☆75Updated 2 years ago
Related projects: ⓘ
- Experimental form data extraction for journalism☆76Updated 3 years ago
- A visualisation tool for Spacy using Hierplane.☆65Updated last year
- A library to extract a publication date from a web page, along with a measure of the accuracy.☆42Updated 5 years ago
- ☆10Updated this week
- Excel Integration with spaCy. Training NER using Excel/XLSX from PDF, DOCX, PPT, PNG or JPG.☆105Updated last year
- Running Prodigy for a team of annotators☆53Updated 3 years ago
- A collection of simple tutorials for using Fonduer☆100Updated 3 years ago
- 🚀GUI for training spaCy models☆53Updated 3 years ago
- Package that returns a company embedding given a company name☆42Updated 4 years ago
- searching large heterogenous data dumps with Universal Sentence Encoder☆62Updated 3 years ago
- Anonymization of legal cases (Fr) based on Flair embeddings☆87Updated 3 years ago
- Dataframe Integration with spaCy.☆100Updated 3 years ago
- Information extraction from English and German texts based on predicate logic☆133Updated last year
- ☆65Updated 2 years ago
- Healthsea is a spaCy pipeline for analyzing user reviews of supplementary products for their effects on health.☆87Updated 2 years ago
- Group thousands of similar spreadsheet or database text entries in seconds☆155Updated last year
- Notebooks configured to be run with Binder, usually found on my blog.☆41Updated last year
- Analyze XML extracted from PDFs (e.g. from TET or PDFMiner)☆20Updated 6 years ago
- python package for performing deduplication using flexible text matching and cleaning in pandas dataframe☆25Updated 3 years ago
- Language detection using Spacy and Fasttext☆53Updated 9 months ago
- A browser user interface for manual labeling of record pairs.☆41Updated last year
- Generate reports for spaCy models.☆28Updated 2 years ago
- Inspect a URL and estimate if it contains a news story☆39Updated 3 weeks ago
- Natural Language Generation for Gramex applications.☆24Updated 2 years ago
- PDF Table Extractor - repository to hold revisable version of code from https://www.cvast.tuwien.ac.at/projects/pdf2table by Burcu Yildiz☆38Updated 6 months ago
- Table Extraction Tool☆89Updated 6 years ago
- ☆17Updated this week
- Binary Python bindings for poppler utils for content extraction☆42Updated 3 years ago
- NERtwork is a collection of scripts to help you create a network graph of co-occurring named entities using open source tools. This is do…☆49Updated 5 months ago
- Python script for matching a list of messy addresses against a gazetteer using dedupe.☆60Updated 4 years ago