harmening / signature_extraction
π¬NLP - Library for splitting email content into a human-written body and an automatically appended signature.
β23Updated 5 years ago
Related projects: β
- Text analysis for automatic bookmarking/keyword extractionβ18Updated 7 years ago
- Word analysis, by domain, on the Common Crawl data set for the purpose of finding industry trendsβ57Updated 7 months ago
- Building a Job Datasetβ21Updated 2 years ago
- Quora Question Scraper - Find & Export relevant Questions 10x fasterβ16Updated 4 years ago
- Make it easier to compare and cross-reference the names of companies and people by applying strong normalisation.β142Updated 7 months ago
- Crawl sites for RSS, Atom, and JSON feeds.β59Updated 3 months ago
- A library to extract a publication date from a web page, along with a measure of the accuracy.β42Updated 5 years ago
- PST extraction and analytic pipelineβ37Updated 6 years ago
- Web Crawlers orchestration framework that lets you create datasets from multiple web sources using yaml configurations.β32Updated 9 months ago
- GraphiPy: Universal Social Data Extractorβ79Updated last year
- A text processing tool including tag(HTML, URL, Email) extraction and removing, punctuation normalization, simple segmentation, and so onβ¦β11Updated last year
- python package for performing deduplication using flexible text matching and cleaning in pandas dataframeβ25Updated 3 years ago
- Language detection using Spacy and Fasttextβ53Updated 9 months ago
- API - extract a list of keywords from a text.β18Updated 7 years ago
- Ingestors extract the contents of mixed unstructured documents into structured (followthemoney) data.β54Updated last month
- A financial disclosure data extraction tool.β13Updated last year
- Source real estate prices from the Common Crawl.β27Updated 5 years ago
- A simple machine learning package to cluster keywords in higher-level groups.β17Updated 2 years ago
- Scalable String Similarity Joins in Pythonβ39Updated 2 months ago
- Parsing resumes in a PDF format from linkedInβ65Updated 7 years ago
- new skills taxonomy using TextKernel dataβ29Updated last year
- Curated list of awesome software and resources for Senzing, The First Real-Time AI for Entity Resolution.β51Updated 3 weeks ago
- Information extraction and interactive visualization of textual datasets for investigative data-driven journalism and eDiscoveryβ53Updated 2 months ago
- how hard is it to get a list of all local news sites in the United States (LOL)β8Updated 4 years ago
- Matches a category of Google's Taxonomy to product that is described in any kind of text dataβ57Updated 6 years ago
- Open Source Thesaurus of Job Titles in US Englishβ136Updated 2 years ago
- SpacyV3 Text Categorizer Tutorialβ17Updated 3 years ago
- API for OpenSanctions with support for entity search and bulk matching of data collections. Supports Reconciliation API spec.β66Updated this week
- Find rss, atom, xml, and rdf feeds on webpagesβ30Updated last year
- https://duyet.github.io/related-skills-visualization/index.htmlβ11Updated 4 years ago