Python library & examples for Masked Language Model Scoring (ACL 2020)
β348Dec 20, 2022Updated 3 years ago
Alternatives and similar repositories for mlm-scoring
Users that are interested in mlm-scoring are comparing it to the libraries listed below
Sorting:
- β12Jun 10, 2021Updated 4 years ago
- πLanguage Model based sentences scoring libraryβ309Feb 9, 2022Updated 4 years ago
- BERT score for text generationβ1,876Jul 30, 2024Updated last year
- The Benchmark of Linguistic Minimal Pairsβ161Dec 13, 2022Updated 3 years ago
- LAnguage Model Analysisβ1,390Jul 7, 2024Updated last year
- Code associated with the Don't Stop Pretraining ACL 2020 paperβ539Nov 15, 2021Updated 4 years ago
- β10Sep 19, 2022Updated 3 years ago
- EMNLP 2021 - CTC: A Unified Framework for Evaluating Natural Language Generationβ97Mar 20, 2023Updated 2 years ago
- BLEURT is a metric for Natural Language Generation based on transfer learning.β786Aug 4, 2023Updated 2 years ago
- Code for LAMOL: LAnguage MOdeling for Lifelong Language Learningβ95Aug 28, 2020Updated 5 years ago
- Semi-supervised spoken language understanding (SLU) via self-supervised speech and language model pretrainingβ12Mar 23, 2021Updated 4 years ago
- Pretrain and finetune ELECTRA with fastai and huggingface. (Results of the paper replicated !)β331Jan 10, 2024Updated 2 years ago
- Properly handle position-dependent phones in a subword lexicon FSTβ31Oct 26, 2020Updated 5 years ago
- SentAugment is a data augmentation technique for NLP that retrieves similar sentences from a large bank of sentences. It can be used in cβ¦β359Feb 22, 2022Updated 4 years ago
- A masked language modeling objective to train a model to predict any subset of the target words, conditioned on both the input text and aβ¦β246Sep 17, 2021Updated 4 years ago
- Code for DeCoAR (ICASSP 2020) and BERTphone (Odyssey 2020)β104Nov 26, 2022Updated 3 years ago
- Pronunciation-assisted Subword Modelingβ31May 30, 2019Updated 6 years ago
- Repo for training MLMs, CLMs, or T5-type models on the OLM pretraining data, but it should work with any hugging face text dataset.β96Feb 9, 2023Updated 3 years ago
- Plug and Play Language Model implementation. Allows to steer topic and attributes of GPT-2 models.β1,155Feb 20, 2024Updated 2 years ago
- [NeurIPS 2021] COCO-LM: Correcting and Contrasting Text Sequences for Language Model Pretrainingβ118Jul 25, 2023Updated 2 years ago
- Code and Data for ACL 2020 paper "Few-Shot NLG with Pre-Trained Language Model"β190May 23, 2025Updated 9 months ago
- Word Discovery in Visually Grounded, Self-Supervised Speech Modelsβ26Dec 4, 2023Updated 2 years ago
- Beyond Accuracy: Behavioral Testing of NLP models with CheckListβ2,050Jan 9, 2024Updated 2 years ago
- ERRor ANnotation Toolkit: Automatically extract and classify grammatical errors in parallel original and corrected sentences.β458Mar 26, 2024Updated last year
- β76Mar 18, 2022Updated 3 years ago
- Easily fine tune GPT-2 to fill in missing textβ203Dec 8, 2022Updated 3 years ago
- Scripts and tools for doing unsupervised acceptability prediction.β14Mar 20, 2023Updated 2 years ago
- Adversarial Natural Language Inference Benchmarkβ399May 12, 2022Updated 3 years ago
- Convert words to numbersβ21Apr 13, 2022Updated 3 years ago
- Official implementation of the papers "GECToR β Grammatical Error Correction: Tag, Not Rewrite" (BEA-20) and "Text Simplification by Taggβ¦β951May 21, 2024Updated last year
- PyTorch code for end-to-end spoken language understanding (SLU) with ASR-based transfer learningβ231Mar 23, 2021Updated 4 years ago
- Artie Bias Corpus: an audio corpus + code for detecting demographic biasβ20Jul 21, 2020Updated 5 years ago
- Fast BPEβ679Jun 18, 2024Updated last year
- Fast, general, and tested differentiable structured prediction in PyTorchβ1,123Apr 20, 2022Updated 3 years ago
- Implementation of different noise embeddings for noise aware training of Kaldi acoustic models.β13Feb 13, 2021Updated 5 years ago
- Enable RNNLM lattice rescoring with Pytorch [kaldi]β12Jun 5, 2020Updated 5 years ago
- This is an extension of kaldi speech recognition software which allows to perform decoding of speech with hybrid word and phoneme graphs.β¦β11Feb 4, 2020Updated 6 years ago
- This dataset contains 108,463 human-labeled and 656k noisily labeled pairs that feature the importance of modeling structure, context, anβ¦β564Jan 4, 2022Updated 4 years ago
- PyTorch original implementation of Cross-lingual Language Model Pretraining.β2,926Feb 14, 2023Updated 3 years ago