sayakpaul / count-tokens-hf-datasetsLinks

This project shows how to derive the total number of training tokens from a large text dataset from šŸ¤— datasets with Apache Beam and Dataflow.
ā˜†27Updated 2 years ago

Alternatives and similar repositories for count-tokens-hf-datasets

Users that are interested in count-tokens-hf-datasets are comparing it to the libraries listed below

Sorting: