google-research-datasets / common-crawl-domain-names

Corpus of domain names scraped from Common Crawl and manually annotated to add word boundaries (e.g. "commoncrawl" to "common crawl").
17Updated 4 years ago

Related projects

Alternatives and complementary repositories for common-crawl-domain-names