mailgun / forgeLinks
email dataset for email signature parsing
β55Updated 9 years ago
Alternatives and similar repositories for forge
Users that are interested in forge are comparing it to the libraries listed below
Sorting:
- A python library detect and extract listing data from HTML page.β108Updated 8 years ago
- Lightning Fast Language Prediction πβ167Updated last month
- Traptor -- A distributed Twitter feedβ26Updated 3 years ago
- Segtok v2 is here: https://github.com/fnl/syntok -- A rule-based sentence segmenter (splitter) and a word tokenizer using orthographic feβ¦β170Updated 3 years ago
- A tiny library for Python text normalisation. Useful for ad-hoc text processing.β155Updated last month
- Python library to infer date format from examplesβ45Updated 3 years ago
- WebAnnotator is a tool for annotating Web pages. WebAnnotator is implemented as a Firefox extension (https://addons.mozilla.org/en-US/fiβ¦β48Updated 3 years ago
- Skinfer is a tool for inferring and merging JSON schemasβ140Updated last year
- Demo code for learning_text_transformerβ25Updated 10 years ago
- Supervised learning for novelty detection in textβ78Updated 9 years ago
- remove signature blocks from emailsβ87Updated 6 years ago
- Find which links on a web page are pagination linksβ29Updated 8 years ago
- BUNT is a Bot UNderstanding Testbedβ36Updated 8 years ago
- A spell-checker extending Peter Norvig's with multi-typo correction, hamming distance weighting, and more.β98Updated 5 years ago
- Reduction is a python script which automatically summarizes a text by extracting the sentences which are deemed to be most important.β54Updated 10 years ago
- Parse, normalize and render postal addresses.β184Updated 2 years ago
- Tools to manipulate and extract data from wikipedia dumpsβ46Updated 12 years ago
- Modularly extensible semantic metadata validatorβ84Updated 9 years ago
- A Python library for extracting titles, images, descriptions and canonical urls from HTML.β151Updated 5 years ago
- Language detection extension for spaCy 2.0+β113Updated 6 years ago
- Web page segmentation and noise removalβ55Updated last year
- A Python library for extracting semantic information from text, such as dates and numbers.β77Updated 3 years ago
- Hidden alignment conditional random field for classifying string pairs.β24Updated this week
- An attempt at creating a silver/gold standard dataset for backtesting yesterday & today's content-extractorsβ35Updated 10 years ago
- With alexafsm, developers can model dialog agents with first-class concepts such as states, attributes, transition, and actions. alexafsmβ¦β111Updated 2 years ago
- NER toolkit for HTML dataβ259Updated last year
- Server/Client around Spacy to load spacy only onceβ46Updated 7 years ago
- Python bindings to the Compact Language Detectorβ33Updated 5 years ago
- Frontera backend to guide a crawl using PageRank, HITS or other ranking algorithms based on the link structure of the web graph, even wheβ¦β55Updated last year
- An automated ingestion service for blogs to construct a corpus for NLP research.β86Updated 7 years ago