malteos / llm-datasets

A collection of datasets for language model pretraining including scripts for downloading, preprocesssing, and sampling.
56Updated 5 months ago

Alternatives and similar repositories for llm-datasets:

Users that are interested in llm-datasets are comparing it to the libraries listed below