onesuper / HuggingFace-Datasets-Text-Quality-Analysis

Retrieves parquet files from Hugging Face, identifies and quantifies junky data, duplication, contamination, and biased content in dataset using pandas
51Updated last year

Alternatives and similar repositories for HuggingFace-Datasets-Text-Quality-Analysis:

Users that are interested in HuggingFace-Datasets-Text-Quality-Analysis are comparing it to the libraries listed below