An asynchronous concurrent pipeline for classifying Common Crawl based on fastText's pipeline.
☆86Apr 21, 2021Updated 4 years ago
Alternatives and similar repositories for goclassy
Users that are interested in goclassy are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- The pipeline for the OSCAR corpus☆176Nov 9, 2025Updated 5 months ago
- Converter from UD-trees to BART representation☆35Mar 6, 2024Updated 2 years ago
- Ukrainian ELECTRA model☆12Mar 11, 2023Updated 3 years ago
- German GPT-2 model☆32Aug 17, 2021Updated 4 years ago
- Analytic platform for the HAL research archive (in development)☆12Oct 2, 2020Updated 5 years ago
- Simple, predictable pricing with DigitalOcean hosting • AdAlways know what you'll pay with monthly caps and flat pricing. Enterprise-grade infrastructure trusted by 600k+ customers.
- Tools to download and cleanup Common Crawl data☆1,040Apr 25, 2023Updated 2 years ago
- Load, build and explore Patstat using the Google Cloud Platform☆10Jan 19, 2019Updated 7 years ago
- Specification of a stand-off element for the TEI guidelines☆12Apr 29, 2021Updated 4 years ago
- Minimal code to train ELMo models in recent versions of TensorFlow