kutvonenaki / cc100-sentencepieceLinks
Common crawl pretrained sentencepiece tokenizers for English and Japanese for various vocabulary sizes. Also development environment for further languages
ā10Updated 3 years ago
Alternatives and similar repositories for cc100-sentencepiece
Users that are interested in cc100-sentencepiece are comparing it to the libraries listed below
Sorting:
- Tutorial to pretrain & fine-tune a š¤ Flax T5 model on a TPUv3-8 with GCPā58Updated 2 years ago
- Helper scripts and notes that were used while porting various nlp modelsā46Updated 3 years ago
- Large-scale query-focused multi-document Summarization datasetā10Updated 3 years ago
- Code for GenAug: Data Augmentation for Finetuning Text Generators.ā28Updated 3 years ago
- A package for fine-tuning Transformers with TPUs, written in Tensorflow2.0+ā38Updated 4 years ago
- Data and code accompanying the paper "Intent Detection with WikiHow"ā10Updated 4 years ago
- Training a model without a dataset for natural language inference (NLI)ā25Updated 4 years ago
- A parallel evaluation data set of SAP software documentation with document structure annotationā11Updated 2 months ago
- A web application that interfaces two GEC systems. [web instance is down]ā31Updated 11 months ago
- ā21Updated 3 years ago
- Corresponding code repo for the paper at COLING 2020 - ARGMIN 2020: "DebateSum: A large-scale argument mining and summarization dataset"ā54Updated 3 years ago
- SeqScore: Scoring for named entity recognition and other sequence labeling tasksā23Updated 4 months ago
- Source codes of Neural Quality Estimation with Multiple Hypotheses for Grammatical Error Correctionā43Updated 4 years ago
- Repository for Findings of EMNLP 2020 "Context-aware Stand-alone Neural Spelling Correction"ā18Updated 4 years ago
- NewsQuizQA is a quiz-style question-answer dataset used for generating quiz questions about the newsā35Updated 4 years ago
- Open source library for few shot NLPā78Updated 2 years ago
- Code and dataset "ZEST" from "Learning from task descriptions", Weller et al, EMNLP 2020ā17Updated 4 years ago
- Implementation of Marge, Pre-training via Paraphrasing, in Pytorchā76Updated 4 years ago
- Load What You Need: Smaller Multilingual Transformers for Pytorch and TensorFlow 2.0.ā103Updated 3 years ago
- ZS4IE: A Toolkit for Zero-Shot Information Extraction with Simple Verbalizationsā28Updated 3 years ago
- The source code of "Language Models are Few-shot Multilingual Learners" (MRL @ EMNLP 2021)ā53Updated 3 years ago
- GrammarTagger ā A Neural Multilingual Grammar Profiler for Language Learningā28Updated 4 years ago
- ā22Updated 3 years ago
- LAReQA is a challenging benchmark for evaluating language agnostic answer retrieval from a multilingual candidate pool. This repository cā¦ā14Updated 5 years ago
- Using BERT for doing the task of Conditional Natural Language Generation by fine-tuning pre-trained BERT on custom dataset.ā41Updated 5 years ago
- A question-answering dataset with a focus on subjective informationā45Updated last year
- Paraphrase Generation model using pair-wise discriminator lossā45Updated 4 years ago
- Multilingual abstractive summarization dataset extracted from WikiHow.ā92Updated 4 months ago
- Code for Paper "Target-oriented Fine-tuning for Zero-Resource Named Entity Recognition"ā21Updated 2 years ago
- Code for ACL 2022 paper "Expanding Pretrained Models to Thousands More Languages via Lexicon-based Adaptation"ā30Updated 3 years ago