huggingface / datatrove

Freeing data processing from scripting madness by providing a set of platform-agnostic customizable pipeline processing blocks.
2,026Updated last week

Related projects

Alternatives and complementary repositories for datatrove