commoncrawl / web-languagesLinks
Crowd-sourced lists of urls to help Common Crawl crawl under-resourced languages. See https://github.com/commoncrawl/web-languages-code/ for the code
β68Updated last week
Alternatives and similar repositories for web-languages
Users that are interested in web-languages are comparing it to the libraries listed below
Sorting:
- π¬ Language Identification with Support for More Than 2000 Labels -- EMNLP 2023β185Updated last month
- A collection of datasets for language model pretraining including scripts for downloading, preprocesssing, and sampling.β64Updated last year
- Libraries, Archives and Museums (LAM)β88Updated 3 years ago
- Efficiently find the best-suited language model (LM) for your NLP taskβ133Updated 5 months ago
- Small python package to measure OCR quality and other related metrics.β25Updated last year
- [EMNLP 2023 Demo] fabricator - annotating and generating datasets with large language models.β111Updated last year
- β67Updated last year
- Generalist and Lightweight Model for Text Classification