mailgun / forge
email dataset for email signature parsing
☆55Updated 8 years ago
Alternatives and similar repositories for forge:
Users that are interested in forge are comparing it to the libraries listed below
- Traptor -- A distributed Twitter feed☆26Updated 2 years ago
- Hidden alignment conditional random field for classifying string pairs.☆24Updated 4 months ago
- Fuzzy Categorical Distances☆14Updated 4 years ago
- Modularly extensible semantic metadata validator☆83Updated 9 years ago
- A component that tries to avoid downloading duplicate content☆27Updated 6 years ago
- Aviation grade news article metadata extraction☆36Updated last year
- remove signature blocks from emails☆86Updated 5 years ago
- A simple algorithm for clustering web pages, suitable for crawlers☆34Updated 7 years ago
- BUNT is a Bot UNderstanding Testbed☆36Updated 8 years ago
- Detect and classify pagination links☆15Updated 4 years ago
- Find which links on a web page are pagination links☆29Updated 8 years ago
- Algorithms for "schema matching"☆25Updated 8 years ago
- mltk - Moz Language Tool Kit☆12Updated 9 years ago
- A Cython implementation of the affine gap string distance☆57Updated 2 years ago
- Read natural language interactive queries. Great for bots.☆18Updated 8 years ago
- Language detection extension for spaCy 2.0+☆112Updated 5 years ago
- WebAnnotator is a tool for annotating Web pages. WebAnnotator is implemented as a Firefox extension (https://addons.mozilla.org/en-US/fi…☆48Updated 3 years ago
- pyaddress is an address parsing library, taking the guesswork out of using addresses in your applications. We use it as part of our apart…☆100Updated 5 years ago
- Script to rotate webserver log file to AWS S3☆29Updated 10 years ago
- Scalable String Similarity Joins in Python☆38Updated 6 months ago
- Web page segmentation and noise removal☆55Updated 11 months ago
- NER toolkit for HTML data☆257Updated 8 months ago
- An implementation of the multi-armed bandit optimization pattern as a Flask extension☆81Updated this week
- A slim, non-SWIG Python adapter to CTesseract (Tesseract OCR for C).☆24Updated 10 years ago
- An attempt at creating a silver/gold standard dataset for backtesting yesterday & today's content-extractors☆34Updated 9 years ago
- Segtok v2 is here: https://github.com/fnl/syntok -- A rule-based sentence segmenter (splitter) and a word tokenizer using orthographic fe…☆170Updated 3 years ago
- For extracting measurements and related entities from text☆57Updated 4 years ago
- Extract, parse and populate templates from strings☆27Updated 5 years ago
- common data interchange format for document processing pipelines that apply natural language processing tools to large streams of text☆34Updated 8 years ago
- S3 Backups provides easy scripts that system administrators can use to backup data from programs likes PostgreSQL, MySQL, Redis, etc.☆67Updated 6 years ago