WanJuan-CC是以CommonCrawl为基础,经过数据抽取,规则清洗,去重,安全过滤,质量清洗等步骤得到的高质量数据。
☆13Apr 18, 2024Updated last year
Alternatives and similar repositories for WanJuan2.0-WanJuan-CC
Users that are interested in WanJuan2.0-WanJuan-CC are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Data Set Description Language Specification (新一代人工智能数据集描述语言DSDL)☆46May 29, 2024Updated last year
- SDK of OpenDataLab - https://opendatalab.org.cn☆59Jul 31, 2025Updated 7 months ago
- 多轮中文聊天机器人,采用GPT2进行微调,清洗聊天数据110w+,采用语义相似度和文本jaccard相似度过滤回话。☆23Nov 13, 2021Updated 4 years ago
- WanJuan3.0(“万卷·丝路”)一个作为综合性的纯文本语料库,采集了多个国家地区的网络公开信息、文献、专利等资料,数据总规模超1.2TB,Token总数超过300B,处于国际领先水平,首期开源的语料库主要由泰语、俄语、阿拉伯语、韩语和越南语5个子集构成,每个子集的数据…☆43Feb 13, 2025Updated last year
- [CVPR 2024] DiffAgent: Fast and Accurate Text-to-Image API Selection with Large Language Model☆19Apr 16, 2024Updated last year
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting with the flexibility to host WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Cloudways by DigitalOcean.
- AAAI 2024: Visual Instruction Generation and Correction☆96Feb 4, 2024Updated 2 years ago
- [ICLR 2025] IDA-VLM: Towards Movie Understanding via ID-Aware Large Vision-Language Model☆37Nov 27, 2024Updated last year
- [NeurIPS 2024] Search for Efficient LLMs☆16Jan 16, 2025Updated last year
- Data annotation component library --provided as NPM packages☆147Mar 18, 2026Updated last week
- 搜索所有中文NLP数据集,附常用英文NLP数据集☆14Mar 1, 2020Updated 6 years ago
- datasets resource☆132Jul 1, 2025Updated 8 months ago
- Save stats from a C program to a InfluxDB database is a simple way. Only 12 function in total.☆15Dec 19, 2022Updated 3 years ago
- ☆11Mar 15, 2023Updated 3 years ago
- Aerial Detection Toolbox☆11Jan 18, 2023Updated 3 years ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting with the flexibility to host WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Cloudways by DigitalOcean.
- 爬取CNVD,CNNVD,中国工控网,以及对于工控网站的选取分析☆18Jan 8, 2018Updated 8 years ago
- ☆11Aug 27, 2024Updated last year
- noear::一个简单的 Elasticsearch ORM 框架(基于 lamabda 表达式,构建类似 sql 的体验)☆19Feb 2, 2026Updated last month
- a library to generate and resolve packets defined in IEC 61850☆14May 13, 2016Updated 9 years ago
- [ICME 2023] FlowText: Synthesizing Realistic Scene Text Video with Optical Flow Estimation☆13May 13, 2023Updated 2 years ago
- OpenMMLab Detection Toolbox and Benchmark for V3Det☆15Apr 3, 2024Updated last year
- EarthVL: A Progressive Earth Vision-Language Understanding and Generation Framework☆37Jan 22, 2026Updated 2 months ago
- A Python package for interacting with the MinerU Vision-Language Model.☆109Updated this week
- Poor man's Hawk-Eye (object 3D tracking)☆12Dec 31, 2018Updated 7 years ago
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- MuZero for Combinatorial Action Spaces: open-source codebase for MA-Gumbel-AlphaZero, MA-Sampled-AlphaZero, MA-Gumbel-MuZero and MA-Sampl…☆23Jan 22, 2024Updated 2 years ago
- 万卷1.0多模态语料☆571Oct 20, 2023Updated 2 years ago
- Interpretable Control Exploration and Counterfactual Explanation (ICE) on StyleGAN☆17Jan 5, 2022Updated 4 years ago
- OpenMMLab Detection Toolbox and Benchmark☆11Aug 1, 2023Updated 2 years ago
- This project is an implementation of two-step object detection (super-resolution and object detection) to address degradation of object d…☆10May 29, 2021Updated 4 years ago
- Monte Carlo tree search in JAX, with functionality to continue search from a previous subtree☆26May 2, 2025Updated 10 months ago
- Code for "ECoFLaP: Efficient Coarse-to-Fine Layer-Wise Pruning for Vision-Language Models" (ICLR 2024)☆20Feb 16, 2024Updated 2 years ago
- ☆16Jul 5, 2023Updated 2 years ago
- ☆16Aug 5, 2022Updated 3 years ago
- NordVPN Threat Protection Pro™ • AdTake your cybersecurity to the next level. Block phishing, malware, trackers, and ads. Lightweight app that works with all browsers.
- 蓝牙5.1室内定位☆12Jun 8, 2022Updated 3 years ago
- [IEEE IV 22'] Code for 'Improved Deep Reinforcement Learning with Expert Demonstrationsfor Urban Autonomous Driving'☆14Jun 17, 2021Updated 4 years ago
- [ECCVW 2024 -- ORAL] Official repository of paper titled "Makeup-Guided Facial Privacy Protection via Untrained Neural Network Priors".☆12Oct 11, 2024Updated last year
- Use cocos creator 2.4.3 to build single html5 for Playable ad that able to use on Facebook, UnityAds , GoogleAds☆17Jul 25, 2021Updated 4 years ago
- ☆12Dec 4, 2023Updated 2 years ago
- ICME2022 Special Session “Beyond Accuracy: Responsible, Responsive, and Robust Multimedia Retrieval ”☆12Jun 3, 2024Updated last year
- GraphQL application using spring 5 reactive framework (webflux)☆45Mar 16, 2018Updated 8 years ago