sayakpaul / count-tokens-hf-datasets

This project shows how to derive the total number of training tokens from a large text dataset from πŸ€— datasets with Apache Beam and Dataflow.
β˜†24Updated 2 years ago

Alternatives and similar repositories for count-tokens-hf-datasets:

Users that are interested in count-tokens-hf-datasets are comparing it to the libraries listed below