oscar-project / goclassy
An asynchronous concurrent pipeline for classifying Common Crawl based on fastText's pipeline.
☆86Updated 3 years ago
Alternatives and similar repositories for goclassy:
Users that are interested in goclassy are comparing it to the libraries listed below
- ☆74Updated 3 years ago
- As good as new. How to successfully recycle English GPT-2 to make models for other languages (ACL Findings 2021)☆48Updated 3 years ago
- ☆87Updated 2 years ago
- A simple neural truecaser written in pytorch and allennlp.☆33Updated 8 months ago
- Tutorial to pretrain & fine-tune a 🤗 Flax T5 model on a TPUv3-8 with GCP☆58Updated 2 years ago
- BERT models for many languages created from Wikipedia texts☆33Updated 4 years ago
- LM Pretraining with PyTorch/TPU☆134Updated 5 years ago
- A package for fine-tuning Transformers with TPUs, written in Tensorflow2.0+☆37Updated 4 years ago
- A tiny BERT for low-resource monolingual models☆31Updated 5 months ago
- A lightweight but powerful library to build token indices for NLP tasks, compatible with major Deep Learning frameworks like PyTorch and …☆51Updated 3 months ago
- Dataset of sentences from Hindi stories tagged with different emotion tags☆10Updated 5 years ago
- Generate BERT vocabularies and pretraining examples from Wikipedias☆18Updated 4 years ago
- Viewer for the 🤗 datasets library.☆84Updated 3 years ago
- Execute arbitrary SQL queries on 🤗 Datasets☆32Updated last year
- Code for the paper "Latent Relation Language Models" at AAAI-20.☆41Updated 4 years ago
- codebase for the Text-based NP Enrichment (TNE) paper☆20Updated last year
- Coursera Corpus Mining and Multistage Fine-Tuning for Improving Lectures Translation☆14Updated 6 months ago
- Accelerated NLP pipelines for fast inference on CPU. Built with Transformers and ONNX runtime.☆126Updated 4 years ago
- Build a dialog dataset from online books in many languages☆72Updated 2 years ago
- Question-answers, collected from Google☆126Updated 3 years ago
- A Benchmark Dataset for Understanding Disfluencies in Question Answering☆62Updated 3 years ago
- Code and datasets of "Multilingual Extractive Reading Comprehension by Runtime Machine Translation"☆40Updated 6 years ago
- Code and Data for Evaluation WG☆41Updated 2 years ago
- Assessing syntactic abilities of BERT☆39Updated 5 years ago
- CoNLL 2005 SRL (Semantic Role Labeling) evaluation script, implemented in Python☆8Updated 6 years ago
- ☆11Updated 4 years ago
- numeric fused-head identification and resolution☆33Updated 5 years ago
- Implementation of the paper 'Sentence Bottleneck Autoencoders from Transformer Language Models'☆17Updated 2 years ago
- Implementation of Z-BERT-A: a zero-shot pipeline for unknown intent detection.☆39Updated last year
- Code for the Shortformer model, from the ACL 2021 paper by Ofir Press, Noah A. Smith and Mike Lewis.☆146Updated 3 years ago