Calculate perplexity on a text with pre-trained language models. Support MLM (eg. DeBERTa), recurrent LM (eg. GPT3), and encoder-decoder LM (eg. Flan-T5).
☆167Jun 20, 2025Updated 9 months ago
Alternatives and similar repositories for lmppl
Users that are interested in lmppl are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- ☆28Mar 20, 2024Updated 2 years ago
- Analyzing mBERT's multilinguality in a small laboratory setting☆13Jun 12, 2023Updated 2 years ago
- M2D2: A Massively Multi-domain Language Modeling Dataset (EMNLP 2022) by Machel Reid, Victor Zhong, Suchin Gururangan, Luke Zettlemoyer☆54Nov 21, 2022Updated 3 years ago
- 日本語文法誤り訂正ツール☆29Jun 22, 2022Updated 3 years ago
- Cluster paraphrases by word sense☆12Jan 3, 2019Updated 7 years ago
- Simple, predictable pricing with DigitalOcean hosting • AdAlways know what you'll pay with monthly caps and flat pricing. Enterprise-grade infrastructure trusted by 600k+ customers.
- ☆15Nov 20, 2025Updated 4 months ago
- Using BERT to calculate perplexity☆20Dec 20, 2019Updated 6 years ago
- Lite Self-Training☆30Jul 25, 2023Updated 2 years ago
- Word acquisition in neural language models (TACL 2022).☆20Jan 30, 2025Updated last year
- UD Greek☆22Dec 5, 2025Updated 3 months ago
- Japanese LLaMa experiment☆54Dec 27, 2025Updated 2 months ago
- ☆13Dec 1, 2021Updated 4 years ago
- R library for accessing data from everypolitician.org☆20Apr 24, 2018Updated 7 years ago
- CSS-LM: Contrastive Semi-supervised Fine-tuning of Pre-trained Language Models☆12Jul 1, 2023Updated 2 years ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click and start building anything your business needs.
- Code Roberta version of RetroMAE: Pre-Training Retrieval-oriented Language Models Via Masked Auto-Encoder☆10Mar 16, 2023Updated 3 years ago
- Official implementation of BPA (CVPR 2022)☆13Jun 17, 2022Updated 3 years ago
- Repo for "Zemi: Learning Zero-Shot Semi-Parametric Language Models from Multiple Tasks" ACL 2023 Findings☆15May 3, 2023Updated 2 years ago
- [NAACL'22] TaCL: Improving BERT Pre-training with Token-aware Contrastive Learning☆94Jun 8, 2022Updated 3 years ago
- Benchmarking Large Language Models☆105Jun 20, 2025Updated 9 months ago
- ☆24Nov 22, 2022Updated 3 years ago
- [AAAI 2025]Automatically Generating Numerous Context-Driven SFT Data for LLMs across Diverse Granularity☆30Mar 17, 2025Updated last year
- This repository provides the code and dataset for the work published in the paper - Modeling Label Semantics for Predicting Emotional Rea…☆26Nov 8, 2020Updated 5 years ago
- albumentations test☆11Jun 23, 2020Updated 5 years ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click and start building anything your business needs.
- Forked repo from https://github.com/EleutherAI/lm-evaluation-harness/commit/1f66adc☆82Feb 28, 2024Updated 2 years ago
- ☆21Mar 28, 2022Updated 3 years ago
- A simple semi-supervised approach for creating huggingface data script loaders and upload to the hub.☆11Jun 23, 2024Updated last year
- Pytorch Tutorial for M1 students. This repository include Encoder Deocder model and Classification model building code.☆12Jun 1, 2022Updated 3 years ago
- Code for EMNLP 2021 paper: Improving Sequence-to-Sequence Pre-training via Sequence Span Rewriting☆17Nov 30, 2021Updated 4 years ago
- Arabic Word-Embedding (Word2vec) model training from Wikipedia articles☆11Dec 13, 2018Updated 7 years ago
- A library for evaluation of Grammatical Error Correction (GEC). Accepted to ACL'25 Demo: "gec-metrics: A Unified Library for Grammatical …☆14Jan 25, 2026Updated 2 months ago
- Thanks auspicious3000's greate work! https://github.com/auspicious3000/autovc This is the implementation of generating mel-spectrogram fr…☆13Oct 21, 2019Updated 6 years ago
- Official Implementation of "Simulating Environments with Reasoning Models for Agent Training"☆60Feb 18, 2026Updated last month
- Open source password manager - Proton Pass • AdSecurely store, share, and autofill your credentials with Proton Pass, the end-to-end encrypted password manager trusted by millions.
- Code for "A BERT-based Distractor Generation Scheme with Multi-tasking and Negative Answer Training Strategies."☆27Feb 2, 2022Updated 4 years ago
- End-to-end codebase for finetuning LLMs (LLaMA 2, 3, etc.) with or without DP☆16Sep 23, 2024Updated last year
- A accurate multilingual word aligner based on LaBSE☆24Oct 25, 2023Updated 2 years ago
- coFR: COreference resolution tool for FRench (and singletons).☆26Jun 7, 2020Updated 5 years ago
- Optimization methods☆30Jan 5, 2015Updated 11 years ago
- Entitypedia is an Extended Named Entity Dictionary from Wikipedia.☆13Dec 7, 2022Updated 3 years ago
- Github repo for NeurIPS 2024 paper "Safe LoRA: the Silver Lining of Reducing Safety Risks when Fine-tuning Large Language Models"☆27Dec 21, 2025Updated 3 months ago