smashew / NameDatabasesLinks
Text databases of last names from various countries
☆281Updated 3 years ago
Alternatives and similar repositories for NameDatabases
Users that are interested in NameDatabases are comparing it to the libraries listed below
Sorting:
- ☆52Updated last year
- The Python library for names.☆964Updated 8 months ago
- All languages stopwords collection☆471Updated last year
- A dataset of popular forenames and surnames by country☆52Updated 2 years ago
- Word lists from the web.☆93Updated 9 years ago
- English stopwords collection☆166Updated 9 years ago
- Machine-readable lists of lemma-token pairs in 23 languages.☆353Updated 3 years ago
- SCOWL (and friends).☆455Updated 4 months ago
- Offline database of synonyms/thesaurus☆206Updated last year
- A command-line tool for using CommonCrawl Index API at http://index.commoncrawl.org/☆205Updated 7 years ago
- Default English stopword lists from many different sources☆311Updated 2 years ago
- List of common stop words in various languages.☆340Updated last month
- A Python parser for MediaWiki wikicode☆852Updated 5 months ago
- Heuristic based boilerplate removal tool☆809Updated 9 months ago
- Full list of US states and cities☆286Updated last year
- 📦 A list, huge one (~200K) of human male/female first/last names.☆55Updated 2 years ago
- ☆83Updated 5 months ago
- Snowball compiler and stemming algorithms☆824Updated this week
- A CSV file with US given names (first name) and their associated nicknames or diminutive names.☆307Updated 4 months ago
- Compact Language Detector 2☆884Updated 4 years ago
- Index Common Crawl archives in tabular format☆124Updated last week
- Stopwords for 50 languages in JSON format☆431Updated 2 years ago
- A set of utility scripts to process Wikipedia related data☆38Updated 3 years ago
- Gather modern English word frequencies from all enwiki articles.☆227Updated last year
- Process Common Crawl data with Python and Spark☆449Updated last month
- Streaming WARC/ARC library for fast web archive IO☆441Updated last year
- UDPipe: Trainable pipeline for tokenizing, tagging, lemmatizing and parsing Universal Treebanks and other CoNLL-U files☆390Updated last week
- The Open English WordNet☆673Updated this week
- ✔️Contextual word checker for better suggestions (not actively maintained)☆418Updated 10 months ago
- 📗 Score text readability using a number of formulas: Flesch-Kincaid Grade Level, Gunning Fog, ARI, Dale Chall, SMOG, and more☆396Updated last year