SimmerChan/corpus

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/SimmerChan/corpus)

SimmerChan / corpus

自然语言处理，知识图谱相关语料。按照Task细分，欢迎PR。

☆735

Alternatives and similar repositories for corpus

Users that are interested in corpus are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

InsaneLife / ChineseNLPCorpus
View on GitHub
中文自然语言处理数据集，平时做做实验的材料。欢迎补充提交合并。
☆4,603Nov 21, 2023Updated 2 years ago
CLUEbenchmark / CLUEDatasetSearch
View on GitHub
搜索所有中文NLP数据集，附常用英文NLP数据集
☆4,459Nov 21, 2022Updated 3 years ago
zhpmatrix / nlp-competitions-list-review
View on GitHub
复盘所有NLP比赛的TOP方案，只关注NLP比赛，持续更新中！
☆2,804Apr 4, 2026Updated 3 months ago
Embedding / Chinese-Word-Vectors
View on GitHub
100+ Chinese Word Vectors 上百种预训练中文词向量
☆12,229Oct 30, 2023Updated 2 years ago
brightmart / nlp_chinese_corpus
View on GitHub
大规模中文自然语言处理语料 Large Scale Chinese Corpus for NLP
☆9,904Feb 6, 2026Updated 5 months ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
liuhuanyong / ChineseSemanticKB
View on GitHub
ChineseSemanticKB,chinese semantic knowledge base, 面向中文处理的12类、百万规模的语义常用词典，包括34万抽象语义库、34万反义语义库、43万同义语义库等，可支持句子扩展、转写、事件抽象与泛化等多种应用场景。
☆783Mar 17, 2023Updated 3 years ago
loujie0822 / DeepIE
View on GitHub
DeepIE: Deep Learning for Information Extraction
☆1,937Dec 9, 2022Updated 3 years ago
CLUEbenchmark / CLUE
View on GitHub
中文语言理解测评基准 Chinese Language Understanding Evaluation Benchmark: datasets, baselines, pre-trained models, corpus and leaderboard
☆4,271Feb 6, 2026Updated 5 months ago
ymcui / Chinese-BERT-wwm
View on GitHub
Pre-Training with Whole Word Masking for Chinese BERT（中文BERT-wwm系列模型）
☆10,223Apr 19, 2026Updated 3 months ago
BDBC-KG-NLP / QA-Survey-CN
View on GitHub
北京航空航天大学大数据高精尖中心自然语言处理研究团队开展了智能问答的研究与应用总结。包括基于知识图谱的问答（KBQA），基于文本的问答系统（TextQA），基于表格的问答系统（TableQA）、基于视觉的问答系统（VisualQA）和机器阅读理解（MRC）等，每类任务分别对…
☆1,815Apr 6, 2023Updated 3 years ago
luge-ai / luge-ai
View on GitHub
☆436Apr 25, 2025Updated last year
CLUEbenchmark / CLUENER2020
View on GitHub
CLUENER2020 中文细粒度命名实体识别 Fine Grained Named Entity Recognition
☆1,520Nov 21, 2022Updated 3 years ago
SophonPlus / ChineseNlpCorpus
View on GitHub
搜集、整理、发布中文自然语言处理语料/数据集，与有志之士共同促进中文自然语言处理的发展。
☆6,588Jan 29, 2019Updated 7 years ago
OYE93 / Chinese-NLP-Corpus
View on GitHub
Collections of Chinese NLP corpus
☆921Dec 28, 2020Updated 5 years ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
ZhuiyiTechnology / pretrained-models
View on GitHub
Open Language Pre-trained Model Zoo
☆1,003Nov 18, 2021Updated 4 years ago
TingFree / NLPer-Arsenal
View on GitHub
收录NLP竞赛策略实现、各任务baseline、相关竞赛经验贴（当前赛事、往期赛事、训练赛）、NLP会议时间、常用自媒体、GPU推荐等，持续更新中
☆2,239Aug 29, 2023Updated 2 years ago
bojone / bert4keras
View on GitHub
keras implement of transformers for humans
☆5,417Nov 11, 2024Updated last year
brightmart / albert_zh
View on GitHub
A LITE BERT FOR SELF-SUPERVISED LEARNING OF LANGUAGE REPRESENTATIONS, 海量中文预训练ALBERT模型
☆3,979Nov 21, 2022Updated 3 years ago
ZhuiyiTechnology / simbert
View on GitHub
a bert for retrieval and generation
☆860Feb 26, 2021Updated 5 years ago
NiuTrans / CNSurvey
View on GitHub
一份中文综述文章列表（自然语言处理&机器学习）
☆581May 26, 2023Updated 3 years ago
LeeSureman / Flat-Lattice-Transformer
View on GitHub
code for ACL 2020 paper: FLAT: Chinese NER Using Flat-Lattice Transformer
☆1,003May 10, 2022Updated 4 years ago
BDBC-KG-NLP / IE-Survey
View on GitHub
北京航空航天大学大数据高精尖中心自然语言处理研究团队对信息抽取领域的调研。包括实体识别，关系抽取，属性抽取等子任务，每类子任务分别对学术界和工业界进行调研。
☆471Apr 29, 2022Updated 4 years ago
thu-coai / NLG_book
View on GitHub
书籍《现代自然语言生成》介绍
☆223Jan 9, 2021Updated 5 years ago
Managed Database hosting by DigitalOcean • Ad
PostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
ChineseGLUE / ChineseGLUE
View on GitHub
Language Understanding Evaluation benchmark for Chinese: datasets, baselines, pre-trained models,corpus and leaderboard
☆1,783Feb 18, 2023Updated 3 years ago
huawei-noah / Pretrained-Language-Model
View on GitHub
Pretrained language model and its related optimization techniques developed by Huawei Noah's Ark Lab.
☆3,162Jan 22, 2024Updated 2 years ago
FudanNLP / fastNLP
View on GitHub
fastNLP: A Modularized and Extensible NLP Framework. Currently still in incubation.
☆3,144Jun 5, 2023Updated 3 years ago
boat-group / fancy-nlp
View on GitHub
NLP for human. A fast and easy-to-use natural language processing (NLP) toolkit, satisfying your imagination about NLP.
☆283Dec 8, 2022Updated 3 years ago
crownpku / Awesome-Chinese-NLP
View on GitHub
A curated list of resources for Chinese NLP 中文自然语言处理相关资料
☆7,929Jul 27, 2023Updated 2 years ago
km1994 / nlp_paper_study
View on GitHub
该仓库主要记录 NLP 算法工程师相关的顶会论文研读笔记
☆4,033Aug 18, 2023Updated 2 years ago
didi / ChineseNLP
View on GitHub
Datasets, SOTA results of every fields of Chinese NLP
☆1,806Apr 7, 2022Updated 4 years ago
fighting41love / NLP_Corpus_Plan
View on GitHub
☆37Jun 14, 2019Updated 7 years ago
dbiir / UER-py
View on GitHub
Open Source Pre-training Model Framework in PyTorch & Pre-trained Model Zoo
☆3,110May 9, 2024Updated 2 years ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
yongzhuo / Macadam
View on GitHub
Macadam是一个以Tensorflow(Keras)和bert4keras为基础，专注于文本分类、序列标注和关系抽取的自然语言处理工具包。支持RANDOM、WORD2VEC、FASTTEXT、BERT、ALBERT、ROBERTA、NEZHA、XLNET、ELECTRA…
☆325Mar 24, 2023Updated 3 years ago
z814081807 / DeepNER
View on GitHub
天池中药说明书实体识别挑战冠军方案；中文命名实体识别；NER; BERT-CRF & BERT-SPAN & BERT-MRC；Pytorch
☆967Dec 23, 2020Updated 5 years ago
brightmart / roberta_zh
View on GitHub
RoBERTa中文预训练模型: RoBERTa for Chinese
☆2,793Jul 22, 2024Updated last year
ymcui / Chinese-ELECTRA
View on GitHub
Pre-trained Chinese ELECTRA（中文ELECTRA预训练模型）
☆1,433Apr 19, 2026Updated 3 months ago
zzy99 / epidemic-sentence-pair
View on GitHub
天池疫情相似句对判定大赛线上第一名方案
☆434Oct 17, 2020Updated 5 years ago
425776024 / nlpcda
View on GitHub
一键中文数据增强包； NLP数据增强、bert数据增强、EDA：pip install nlpcda
☆1,880Mar 18, 2025Updated last year
husthuke / awesome-knowledge-graph
View on GitHub
整理知识图谱相关学习资料
☆5,138Mar 11, 2021Updated 5 years ago