oscar-project / goclassy
An asynchronous concurrent pipeline for classifying Common Crawl based on fastText's pipeline.
β86Updated 4 years ago
Alternatives and similar repositories for goclassy:
Users that are interested in goclassy are comparing it to the libraries listed below
- Tutorial to pretrain & fine-tune a π€ Flax T5 model on a TPUv3-8 with GCPβ58Updated 2 years ago
- β76Updated 3 years ago
- As good as new. How to successfully recycle English GPT-2 to make models for other languages (ACL Findings 2021)β48Updated 3 years ago
- A tiny BERT for low-resource monolingual modelsβ31Updated 7 months ago
- A simple neural truecaser written in pytorch and allennlp.β33Updated 10 months ago
- GrammarTagger β A Neural Multilingual Grammar Profiler for Language Learningβ27Updated 4 years ago
- BERT models for many languages created from Wikipedia textsβ33Updated 4 years ago
- A lightweight but powerful library to build token indices for NLP tasks, compatible with major Deep Learning frameworks like PyTorch and β¦β51Updated 5 months ago
- β87Updated 2 years ago
- Many Natural Language Processing tasks rely on sentence boundary detection (SBD). Although amazing libraries like spacy provide state of β¦β61Updated 4 years ago
- Load What You Need: Smaller Multilingual Transformers for Pytorch and TensorFlow 2.0.β102Updated 2 years ago
- Code and datasets of "Multilingual Extractive Reading Comprehension by Runtime Machine Translation"β40Updated 6 years ago
- XtremeDistil framework for distilling/compressing massive multilingual neural network models to tiny and efficient models for AI at scaleβ154Updated last year
- Coursera Corpus Mining and Multistage Fine-Tuning for Improving Lectures Translationβ14Updated 8 months ago
- Accelerated NLP pipelines for fast inference on CPU. Built with Transformers and ONNX runtime.β126Updated 4 years ago
- LM Pretraining with PyTorch/TPUβ134Updated 5 years ago
- A collection of scripts to preprocess ASR datasets and finetune language-specific Wav2Vec2 XLSR modelsβ31Updated 4 years ago
- Viewer for the π€ datasets library.β84Updated 3 years ago
- Execute arbitrary SQL queries on π€ Datasetsβ32Updated last year
- A web interface to understand language-specific BERT-modelsβ17Updated last year
- β¨ Web interface for NeuralCoref coreference resolutionβ35Updated last year
- This repository contains datasets and code for the paper "HINT3: Raising the bar for Intent Detection in the Wild" accepted at EMNLP-2020β¦β33Updated 4 years ago
- We release a dataset based on Wikipedia sentences and the corresponding translations in 6 different languages along with the scores (scalβ¦β81Updated 3 years ago
- A framework for building semantic parsers (including neural module networks) with AllenNLP, built by the authors of AllenNLPβ108Updated 3 years ago
- A Benchmark Dataset for Understanding Disfluencies in Question Answeringβ62Updated 3 years ago
- Code for pre-training CharacterBERT models (as well as BERT models).β34Updated 3 years ago
- Generate BERT vocabularies and pretraining examples from Wikipedias