sayakpaul / count-tokens-hf-datasets

This project shows how to derive the total number of training tokens from a large text dataset from 🤗 datasets with Apache Beam and Dataflow.
24Updated 2 years ago

Related projects

Alternatives and complementary repositories for count-tokens-hf-datasets