CASIA-LM / ChineseWebText-2.0Links
Large-Scale High-quality Chinese Web Text with Multi-dimensional and fine-grained information
☆26Updated 6 months ago
Alternatives and similar repositories for ChineseWebText-2.0
Users that are interested in ChineseWebText-2.0 are comparing it to the libraries listed below
Sorting:
- ☆97Updated last year
- EMNLP'2024: Knowledge Verification to Nip Hallucination in the Bud☆22Updated last year
- “悟道”数据☆44Updated 3 years ago
- ☆48Updated last year
- Official github repo for ACLUE, an evaluation benchmark focused on ancient Chinese language comprehension☆30Updated last year
- Clustering and Ranking: Diversity-preserved Instruction Selection through Expert-aligned Quality Estimation☆79Updated 7 months ago
- ☆169Updated last year
- 文本去重☆72Updated last year
- Source code for ACL 2023 paper Decoder Tuning: Efficient Language Understanding as Decoding☆50Updated 2 years ago
- T2Ranking: A large-scale Chinese benchmark for passage ranking.☆159Updated last year
- ☆53Updated 3 years ago
- The Corpus & Code for EMNLP 2022 paper "FCGEC: Fine-Grained Corpus for Chinese Grammatical Error Correction" | FCGEC中文语法纠错语料及STG模型☆117Updated 6 months ago
- Tool for converting LLMs from uni-directional to bi-directional by removing causal mask for tasks like classification and sentence embedd…☆60Updated 6 months ago
- [ACL 2024 Findings] Learning Fine-Grained Grounded Citations for Attributed Large Language Models☆18Updated 8 months ago
- CLongEval: A Chinese Benchmark for Evaluating Long-Context Large Language Models☆40Updated last year
- Rephrasing Language Model for CSC (AAAI 2024)☆41Updated last year
- Code & Data for our Paper "RobustGEC: Robust Grammatical Error Correction Against Subtle Context Perturbation" (EMNLP 2023)☆17Updated last year
- ☆76Updated last year
- [ACL'24] Superfiltering: Weak-to-Strong Data Filtering for Fast Instruction-Tuning☆158Updated 9 months ago
- MEASURING MASSIVE MULTITASK CHINESE UNDERSTANDING☆87Updated last year
- ☆141Updated last year
- 格物-多语言和中文大规模预训练模型-轻量版,涵盖纯中文、知识增强、113个语种多语言,采用主流Roberta架构,适用于NLU和NLG任务, 支持pytorch、tensorflow、uer、huggingface等框架。 Multilingual and Chinese …☆29Updated 2 years ago
- Dataset and evaluation script for "Evaluating Hallucinations in Chinese Large Language Models"☆131Updated last year
- 百川Dynamic NTK-ALiBi的代码实现:无需微调即可推理更长文本☆47Updated last year
- OPD: Chinese Open-Domain Pre-trained Dialogue Model☆75Updated 2 years ago
- CFBench: A Comprehensive Constraints-Following Benchmark for LLMs☆35Updated 9 months ago
- 中文 Instruction tuning datasets☆132Updated last year
- Ongoing research training transformer language models at scale, including: BERT & GPT-2☆69Updated last year
- The baseline method for CCIR 22 https://www.datafountain.cn/competitions/573☆13Updated 2 years ago
- 基于DPO算法微调语言大模型,简单好上手。☆39Updated 11 months ago