huggingface / datatrove

Freeing data processing from scripting madness by providing a set of platform-agnostic customizable pipeline processing blocks.
2,167Updated this week

Alternatives and similar repositories for datatrove:

Users that are interested in datatrove are comparing it to the libraries listed below