opendatalab / WanJuan2.0-WanJuan-CC

WanJuan-CC是以CommonCrawl为基础,经过数据抽取,规则清洗,去重,安全过滤,质量清洗等步骤得到的高质量数据。
13Updated 10 months ago

Alternatives and similar repositories for WanJuan2.0-WanJuan-CC:

Users that are interested in WanJuan2.0-WanJuan-CC are comparing it to the libraries listed below