mailgun / forgeLinks
email dataset for email signature parsing
☆55Updated 8 years ago
Alternatives and similar repositories for forge
Users that are interested in forge are comparing it to the libraries listed below
Sorting:
- Script to rotate webserver log file to AWS S3☆29Updated 10 years ago
- Traptor -- A distributed Twitter feed☆26Updated 2 years ago
- A simple algorithm for clustering web pages, suitable for crawlers☆34Updated 8 years ago
- WebAnnotator is a tool for annotating Web pages. WebAnnotator is implemented as a Firefox extension (https://addons.mozilla.org/en-US/fi…☆48Updated 3 years ago
- Language detection extension for spaCy 2.0+☆112Updated 6 years ago
- Parse, normalize and render postal addresses.☆184Updated last year
- Server/Client around Spacy to load spacy only once☆46Updated 7 years ago
- Hidden alignment conditional random field for classifying string pairs.☆24Updated 3 weeks ago
- pyaddress is an address parsing library, taking the guesswork out of using addresses in your applications. We use it as part of our apart…☆100Updated 5 years ago
- A visualisation tool for Spacy using Hierplane.☆65Updated 2 years ago
- Python library to infer date format from examples☆43Updated 3 years ago
- Demo code for learning_text_transformer☆25Updated 10 years ago
- ☆70Updated 2 years ago
- Python bindings for the Google's FarmHash☆39Updated 9 months ago
- ☆30Updated 2 years ago
- A small HTTP API for SyntaxNet☆19Updated 6 years ago
- A Gearman worker which cURLs to do work.☆51Updated 10 years ago
- Detect and classify pagination links☆15Updated 4 years ago
- Aviation grade news article metadata extraction☆36Updated 2 years ago
- Python binding for gumbo-parser using Cython☆14Updated 8 years ago
- Extract postal addresses from the DOM☆66Updated 12 years ago
- Non-Overlapping Aho-Corasick Python extension, for Python 2 (str and unicode) and Python 3☆51Updated 9 years ago
- A slim, non-SWIG Python adapter to CTesseract (Tesseract OCR for C).☆24Updated 11 years ago
- An attempt at creating a silver/gold standard dataset for backtesting yesterday & today's content-extractors☆35Updated 10 years ago
- Use ML-Annotate to label data for machine learning purposes☆109Updated 4 years ago
- remove signature blocks from emails☆86Updated 6 years ago
- Lightning Fast Language Prediction 🚀☆167Updated 6 years ago
- Python search module for fast approximate string matching☆54Updated 2 years ago
- A component that tries to avoid downloading duplicate content☆27Updated 7 years ago
- Supervised learning for novelty detection in text☆78Updated 8 years ago