opendatalab / WanJuan2.0-WanJuan-CC

WanJuan-CC是以CommonCrawl为基础,经过数据抽取,规则清洗,去重,安全过滤,质量清洗等步骤得到的高质量数据。
12Updated 7 months ago

Related projects

Alternatives and complementary repositories for WanJuan2.0-WanJuan-CC