togethercomputer / RedPajama-Data

The RedPajama-Data repository contains code for preparing large datasets for training large language models.
4,571Updated last month

Related projects

Alternatives and complementary repositories for RedPajama-Data