opendatalab/WanJuan2.0-WanJuan-CC

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/opendatalab/WanJuan2.0-WanJuan-CC)

opendatalab / WanJuan2.0-WanJuan-CC

WanJuan-CC是以CommonCrawl为基础，经过数据抽取，规则清洗，去重，安全过滤，质量清洗等步骤得到的高质量数据。

☆14

Alternatives and similar repositories for WanJuan2.0-WanJuan-CC

Users that are interested in WanJuan2.0-WanJuan-CC are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

opendatalab / dsdl-docs
View on GitHub
Data Set Description Language Specification （新一代人工智能数据集描述语言DSDL）
☆46May 29, 2024Updated 2 years ago
dataaug / chatbot_multiround
View on GitHub
多轮中文聊天机器人，采用GPT2进行微调，清洗聊天数据110w+，采用语义相似度和文本jaccard相似度过滤回话。
☆22Nov 13, 2021Updated 4 years ago
opendatalab / opendatalab-python-sdk
View on GitHub
SDK of OpenDataLab - https://opendatalab.org.cn
☆60Jul 31, 2025Updated 11 months ago
veeicwgy / ip-publisher
View on GitHub
Knowledge-base-driven article generation, fact audit, and 7-platform publish packs for WeChat, Xiaohongshu, Zhihu, Juejin, CSDN, Toutiao,…
☆26Apr 23, 2026Updated 3 months ago
1Reminding / MediChain-LLM-Agent
View on GitHub
基于LLM的医学Agent:可提供准确的药品建议，生成医学诊断书（包含药品建议，病情分析，建议等内容，格式以标准诊断书的形式呈现）；包含前端和后端开发全部代码；包含需求文档和项目说明；包含程序员日报。南开大学22届实习实训人工智能。
☆18Aug 28, 2024Updated last year
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
opendatalab / VIGC
View on GitHub
AAAI 2024: Visual Instruction Generation and Correction
☆97Feb 4, 2024Updated 2 years ago
shawnricecake / search-llm
View on GitHub
[NeurIPS 2024] Search for Efficient LLMs
☆16Jan 16, 2025Updated last year
jiyt17 / IDA-VLM
View on GitHub
[ICLR 2025] IDA-VLM: Towards Movie Understanding via ID-Aware Large Vision-Language Model
☆37Nov 27, 2024Updated last year
loujie0822 / CLUEDatasetSearch
View on GitHub
搜索所有中文NLP数据集，附常用英文NLP数据集
☆14Mar 1, 2020Updated 6 years ago
OpenGVLab / DiffAgent
View on GitHub
[CVPR 2024] DiffAgent: Fast and Accurate Text-to-Image API Selection with Large Language Model
☆19Apr 16, 2024Updated 2 years ago
nigelargriffiths / InfluxDB-C-client
View on GitHub
Save stats from a C program to a InfluxDB database is a simple way. Only 12 function in total.
☆15Dec 19, 2022Updated 3 years ago
opendatalab / labelU-Kit
View on GitHub
Data annotation component library --provided as NPM packages
☆159Jul 21, 2026Updated last week
NHirose / ExAug
View on GitHub
☆11Mar 15, 2023Updated 3 years ago
liuyanyi / AD-Toolbox
View on GitHub
Aerial Detection Toolbox
☆11Jan 18, 2023Updated 3 years ago
Proton VPN Special Offer - Get 70% off • Ad
Special partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
opendatalab / opendatalab-datasets
View on GitHub
datasets resource
☆150May 27, 2026Updated 2 months ago
yueyingyehua / gongkongSpiders
View on GitHub
爬取CNVD，CNNVD，中国工控网，以及对于工控网站的选取分析
☆18Jan 8, 2018Updated 8 years ago
illumara / VTON-HandFit
View on GitHub
☆11Aug 27, 2024Updated last year
noear / esearchx
View on GitHub
noear::一个简单的 Elasticsearch ORM 框架（基于 lamabda 表达式，构建类似 sql 的体验）
☆20Apr 25, 2026Updated 3 months ago
TscCai / IEC61850Packet
View on GitHub
a library to generate and resolve packets defined in IEC 61850
☆14May 13, 2016Updated 10 years ago
opendatalab / RxnCaption
View on GitHub
[CVPR 2026] SOTA Chemical Reaction Diagram Parsing Framework
☆26Mar 24, 2026Updated 4 months ago
callsys / FlowText
View on GitHub
[ICME 2023] FlowText: Synthesizing Realistic Scene Text Video with Optical Flow Estimation
☆13May 13, 2023Updated 3 years ago
opendatalab / WanJuan3.0
View on GitHub
WanJuan3.0（“万卷·丝路”）一个作为综合性的纯文本语料库，采集了多个国家地区的网络公开信息、文献、专利等资料，数据总规模超1.2TB，Token总数超过300B，处于国际领先水平，首期开源的语料库主要由泰语、俄语、阿拉伯语、韩语和越南语5个子集构成，每个子集的数据…
☆47Feb 13, 2025Updated last year
V3Det / mmdetection-V3Det
View on GitHub
OpenMMLab Detection Toolbox and Benchmark for V3Det
☆15Apr 3, 2024Updated 2 years ago
Virtual machines for every use case on DigitalOcean • Ad
Get dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
ThomasZB / ble-positioning
View on GitHub
蓝牙5.1室内定位
☆12Jun 8, 2022Updated 4 years ago
tjuHaoXiaotian / MA-MuZero
View on GitHub
MuZero for Combinatorial Action Spaces: open-source codebase for MA-Gumbel-AlphaZero, MA-Sampled-AlphaZero, MA-Gumbel-MuZero and MA-Sampl…
☆23Jan 22, 2024Updated 2 years ago
Werkov / dove-eye
View on GitHub
Poor man's Hawk-Eye (object 3D tracking)
☆12Dec 31, 2018Updated 7 years ago
opendatalab / MinerU-Ecosystem
View on GitHub
☆168May 11, 2026Updated 2 months ago
ylsung / ECoFLaP
View on GitHub
Code for "ECoFLaP: Efficient Coarse-to-Fine Layer-Wise Pruning for Vision-Language Models" (ICLR 2024)
☆21Feb 16, 2024Updated 2 years ago
Simonhfls / GAPS
View on GitHub
This repo is official implementation of GAPS: Geometry-Aware, Physics-Based, Self-Supervised Neural Garment Draping, 3DV 2024
☆22Feb 27, 2024Updated 2 years ago
Li-Qingyun / mmdetection
View on GitHub
OpenMMLab Detection Toolbox and Benchmark
☆11Aug 1, 2023Updated 2 years ago
prclibo / ice
View on GitHub
Interpretable Control Exploration and Counterfactual Explanation (ICE) on StyleGAN
☆17Jan 5, 2022Updated 4 years ago
lowrollr / mctx-az
View on GitHub
Monte Carlo tree search in JAX, with functionality to continue search from a previous subtree
☆27May 2, 2025Updated last year
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
dnap512 / SROD
View on GitHub
This project is an implementation of two-step object detection (super-resolution and object detection) to address degradation of object d…
☆10May 29, 2021Updated 5 years ago
opendatalab / WanJuan1.0
View on GitHub
万卷1.0多模态语料
☆574Oct 20, 2023Updated 2 years ago
zouzx / sc-neus
View on GitHub
☆16Jul 5, 2023Updated 3 years ago
FuchenUSTC / DTF
View on GitHub
☆16Aug 5, 2022Updated 3 years ago
georgeliu233 / DRLFD_Urban
View on GitHub
[IEEE IV 22'] Code for 'Improved Deep Reinforcement Learning with Expert Demonstrationsfor Urban Autonomous Driving'
☆14Jun 17, 2021Updated 5 years ago
ttcong194 / Cocos-to-PlayableAd-HTML5
View on GitHub
Use cocos creator 2.4.3 to build single html5 for Playable ad that able to use on Facebook, UnityAds , GoogleAds
☆17Jul 25, 2021Updated 5 years ago
fahadshamshad / deep-facial-privacy-prior
View on GitHub
[ECCVW 2024 -- ORAL] Official repository of paper titled "Makeup-Guided Facial Privacy Protection via Untrained Neural Network Priors".
☆12Oct 11, 2024Updated last year