smashew / NameDatabases
Text databases of last names from various countries
☆280Updated 2 years ago
Alternatives and similar repositories for NameDatabases:
Users that are interested in NameDatabases are comparing it to the libraries listed below
- ☆51Updated 10 months ago
- The Python library for names.☆897Updated 2 weeks ago
- The largest English-language thesaurus☆292Updated 2 years ago
- Offline database of synonyms/thesaurus☆195Updated last year
- Lightning Fast Language Prediction 🚀☆166Updated 6 years ago
- Locality Sensitive Hashing using MinHash in Python/Cython to detect near duplicate text documents☆284Updated last year
- Default English stopword lists from many different sources☆298Updated 2 years ago
- NamSor API v2 Python SDK - classify personal names accurately by gender, country of origin, or ethnicity.☆36Updated last year
- A dataset of multinational first names and last names☆26Updated last year
- Bitextor generates translation memories from multilingual websites☆292Updated 5 months ago
- Probabilistically split concatenated words using NLP based on English Wikipedia unigram frequencies.☆850Updated 2 years ago
- ☆79Updated last year
- A command-line tool for using CommonCrawl Index API at http://index.commoncrawl.org/☆189Updated 6 years ago
- Abydos NLP/IR library for Python☆185Updated 2 years ago
- Fast and customizable text tokenization library with BPE and SentencePiece support☆302Updated last week
- UNOFFICIAL Python API to interface with Parler.com☆53Updated 9 months ago
- roll a wikipedia dump into mongo☆243Updated 9 months ago
- Multilingual Rapid Automatic Keyword Extraction (RAKE) for Python☆268Updated last year
- See https://meta.wikimedia.org/wiki/Research:Modeling_Talk_Page_Abuse☆152Updated 4 years ago
- Model Training tool for MITIE☆79Updated 9 years ago
- SCOWL (and friends).☆419Updated 2 weeks ago
- Python Pushshift.io API Wrapper (for comment/submission search)☆361Updated 2 years ago
- French stopwords collection☆96Updated 5 years ago
- Example scripts for the pushshift dump files☆357Updated 2 weeks ago
- Hate speech dataset from Stormfront forum manually labelled at sentence level.☆171Updated 4 years ago
- Novel character relationship analytics system☆11Updated 8 years ago
- A dataset of popular forenames and surnames by country☆32Updated last year
- Python bindings to libpostal for fast international address parsing/normalization☆805Updated 2 months ago
- Tools to work with the big reddit JSON data dump.☆253Updated 9 months ago
- Compact Language Detector 2☆859Updated 3 years ago