malteos / llm-datasets

A collection of datasets for language model pretraining including scripts for downloading, preprocesssing, and sampling.
53Updated 3 months ago

Related projects

Alternatives and complementary repositories for llm-datasets