AI21Labs / lm-evaluationLinks

Evaluation suite for large-scale language models.

☆128

Alternatives and similar repositories for lm-evaluation

Users that are interested in lm-evaluation are comparing it to the libraries listed below

Sorting:

CarperAI / InstructGPT
For experiments involving instruct gpt. Currently used for documenting open research questions.
☆71Updated 3 years ago
leogao2 / lm_dataformat
☆78Updated 2 years ago
HendrikStrobelt / LMdiff
A diff tool for language models
☆44Updated last year
EleutherAI / openwebtext2
☆92Updated 3 years ago
zphang / minimal-opt
☆67Updated 3 years ago
stas00 / porting
Helper scripts and notes that were used while porting various nlp models
☆48Updated 3 years ago
huggingface / olm-datasets
Pipeline for pulling and processing online language model pretraining data from the web
☆178Updated 2 years ago
Rallio67 / language-model-agents
Experiments with generating opensource language model assistants
☆97Updated 2 years ago
google-deepmind / streamingqa
☆49Updated 2 years ago
microsoft / xtreme-distil-transformers
XtremeDistil framework for distilling/compressing massive multilingual neural network models to tiny and efficient models for AI at scale
☆157Updated last year
google-deepmind / pg19
☆250Updated 5 years ago
facebookresearch / EditEval
An instruction-based benchmark for text improvements.
☆143Updated 3 years ago
EleutherAI / stackexchange-dataset
Python tools for processing the stackexchange data dumps into a text dataset for Language Models
☆85Updated 2 years ago
salesforce / TaiChi
Open source library for few shot NLP
☆78Updated 2 years ago
krandiash / quinine
A library to create and manage configuration files, especially for machine learning projects.
☆79Updated 3 years ago
salesforce / Converse
☆132Updated 2 years ago
shayne-longpre / a-pretrainers-guide
☆72Updated 2 years ago
EleutherAI / DeeperSpeed
DeepSpeed is a deep learning optimization library that makes distributed training easy, efficient, and effective.
☆171Updated 2 months ago
EleutherAI / semantic-memorization
☆44Updated last year
mega002 / lm-debugger
The official code of LM-Debugger, an interactive tool for inspection and intervention in transformer-based language models.
☆180Updated 3 years ago
stanfordnlp / ColBERT-QA
Code for Relevance-guided Supervision for OpenQA with ColBERT (TACL'21)
☆40Updated 4 years ago
huggingface / olm-training
Repo for training MLMs, CLMs, or T5-type models on the OLM pretraining data, but it should work with any hugging face text dataset.
☆96Updated 2 years ago
google-research-datasets / QAmeleon
QAmeleon introduces synthetic multilingual QA data using PaLM, a 540B large language model. This dataset was generated by prompt tuning P…
☆35Updated 2 years ago
facebookresearch / coder_reviewer_reranking
Official code release for the paper Coder Reviewer Reranking for Code Generation.
☆45Updated 2 years ago
facebookresearch / dynalab
The Python library with command line tools to interact with Dynabench(https://dynabench.org/), such as uploading models.
☆55Updated 3 years ago
AI21Labs / sense-bert
This is the code for loading the SenseBERT model, described in our paper from ACL 2020.
☆46Updated 2 years ago
martiansideofthemoon / rankgen
Official code and model checkpoints for our EMNLP 2022 paper "RankGen - Improving Text Generation with Large Ranking Models" (https://arx…
☆138Updated 2 years ago
bigscience-workshop / architecture-objective
☆98Updated 2 years ago
google-research / t5x_retrieval
☆101Updated 2 years ago
facebookresearch / concurrentqa
This repo contains data and code for the paper "Reasoning over Public and Private Data in Retrieval-Based Systems."
☆46Updated last year