shibing624/pycorrector

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/shibing624/pycorrector)

shibing624 / pycorrector

pycorrector is a toolkit for text error correction. 文本纠错，实现了Kenlm，T5，MacBERT，ChatGLM3，Qwen2.5等模型应用在纠错场景，开箱即用。

☆6,487

Alternatives and similar repositories for pycorrector

Users that are interested in pycorrector are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

iqiyi / FASPell
View on GitHub
2019-SOTA简繁中文拼写检查工具：FASPell Chinese Spell Checker (Chinese Spell Check / 中文拼写检错 / 中文拼写纠错 / 中文拼写检查)
☆1,224Sep 3, 2022Updated 3 years ago
HillZhang1999 / MuCGEC
View on GitHub
MuCGEC中文纠错数据集及文本纠错SOTA模型开源；Code & Data for our NAACL 2022 Paper "MuCGEC: a Multi-Reference Multi-Source Evaluation Dataset for Chinese Gr…
☆569Jun 9, 2023Updated 3 years ago
ymcui / Chinese-BERT-wwm
View on GitHub
Pre-Training with Whole Word Masking for Chinese BERT（中文BERT-wwm系列模型）
☆10,222Apr 19, 2026Updated 2 months ago
brightmart / nlp_chinese_corpus
View on GitHub
大规模中文自然语言处理语料 Large Scale Chinese Corpus for NLP
☆9,905Feb 6, 2026Updated 5 months ago
brightmart / albert_zh
View on GitHub
A LITE BERT FOR SELF-SUPERVISED LEARNING OF LANGUAGE REPRESENTATIONS, 海量中文预训练ALBERT模型
☆3,980Nov 21, 2022Updated 3 years ago
End-to-end encrypted email - Proton Mail • Ad
Special offer: 40% Off Yearly / 80% Off First Month. All Proton services are open source and independently audited for security.
ccheng16 / correction
View on GitHub
Chinese "spelling" error correction
☆265Nov 28, 2017Updated 8 years ago
Embedding / Chinese-Word-Vectors
View on GitHub
100+ Chinese Word Vectors 上百种预训练中文词向量
☆12,230Oct 30, 2023Updated 2 years ago
CLUEbenchmark / CLUE
View on GitHub
中文语言理解测评基准 Chinese Language Understanding Evaluation Benchmark: datasets, baselines, pre-trained models, corpus and leaderboard
☆4,270Feb 6, 2026Updated 5 months ago
wdimmy / Automatic-Corpus-Generation
View on GitHub
This repository is for the paper "A Hybrid Approach to Automatic Corpus Generation for Chinese Spelling Check"
☆295Oct 10, 2019Updated 6 years ago
gitabtion / BertBasedCorrectionModels
View on GitHub
PyTorch impelementations of BERT-based Spelling Error Correction Models. 基于BERT的文本纠错模型，使用PyTorch实现。
☆277Feb 17, 2025Updated last year
ACL2020SpellGCN / SpellGCN
View on GitHub
SpellGCN
☆249Feb 28, 2021Updated 5 years ago
tiantian91091317 / OCR-Corrector
View on GitHub
利用语言模型，纠正OCR识别错误
☆473May 22, 2023Updated 3 years ago
bojone / bert4keras
View on GitHub
keras implement of transformers for humans
☆5,417Nov 11, 2024Updated last year
dbiir / UER-py
View on GitHub
Open Source Pre-training Model Framework in PyTorch & Pre-trained Model Zoo
☆3,110May 9, 2024Updated 2 years ago
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
InsaneLife / ChineseNLPCorpus
View on GitHub
中文自然语言处理数据集，平时做做实验的材料。欢迎补充提交合并。
☆4,599Nov 21, 2023Updated 2 years ago
ZhuiyiTechnology / pretrained-models
View on GitHub
Open Language Pre-trained Model Zoo
☆1,004Nov 18, 2021Updated 4 years ago
kpu / kenlm
View on GitHub
KenLM: Faster and Smaller Language Model Queries
☆2,791Mar 30, 2025Updated last year
chatopera / Synonyms
View on GitHub
中文近义词：聊天机器人，智能问答工具包
☆5,107Feb 1, 2026Updated 5 months ago
nghuyong / Chinese-text-correction-papers
View on GitHub
text correction papers
☆315Jan 23, 2024Updated 2 years ago
425776024 / nlpcda
View on GitHub
一键中文数据增强包； NLP数据增强、bert数据增强、EDA：pip install nlpcda
☆1,880Mar 18, 2025Updated last year
dongrixinyu / JioNLP
View on GitHub
中文 NLP 预处理、解析工具包，准确、高效、易用 A Chinese NLP Preprocessing & Parsing Package www.jionlp.com
☆3,852Jun 5, 2026Updated last month
fighting41love / funNLP
View on GitHub
中英文敏感词、语言检测、中外手机/电话归属地/运营商查询、名字推断性别、手机号抽取、身份证抽取、邮箱抽取、中日文人名库、中文缩写库、拆字词典、词汇情感值、停用词、反动词表、暴恐词表、繁简体转换、英文模拟中文发音、汪峰歌词生成器、职业名称词库、同义词库、反义词库、否定词库、汽…
☆81,820May 10, 2024Updated 2 years ago
LianjiaTech / BELLE
View on GitHub
BELLE: Be Everyone's Large Language model Engine（开源中文对话大模型）
☆8,274Oct 16, 2024Updated last year
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
ymcui / Chinese-ELECTRA
View on GitHub
Pre-trained Chinese ELECTRA（中文ELECTRA预训练模型）
☆1,433Apr 19, 2026Updated 2 months ago
brightmart / roberta_zh
View on GitHub
RoBERTa中文预训练模型: RoBERTa for Chinese
☆2,792Jul 22, 2024Updated last year
ZhuiyiTechnology / simbert
View on GitHub
a bert for retrieval and generation
☆860Feb 26, 2021Updated 5 years ago
destwang / CTCResources
View on GitHub
☆270Jul 26, 2024Updated last year
tongchangD / bert_for_corrector
View on GitHub
基于bert进行中文文本纠错
☆242Jun 12, 2023Updated 3 years ago
macanv / BERT-BiLSTM-CRF-NER
View on GitHub
Tensorflow solution of NER task Using BiLSTM-CRF model with Google BERT Fine-tuning And private Server services
☆4,907Feb 24, 2021Updated 5 years ago
CLUEbenchmark / CLUEDatasetSearch
View on GitHub
搜索所有中文NLP数据集，附常用英文NLP数据集
☆4,457Nov 21, 2022Updated 3 years ago
PaddlePaddle / ERNIE
View on GitHub
The official repository for ERNIE 4.5 and ERNIEKit – its industrial-grade development toolkit based on PaddlePaddle.
☆7,722Jan 4, 2026Updated 6 months ago
crownpku / Awesome-Chinese-NLP
View on GitHub
A curated list of resources for Chinese NLP 中文自然语言处理相关资料
☆7,929Jul 27, 2023Updated 2 years ago
Bare Metal GPUs on DigitalOcean Gradient AI • Ad
Purpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
zhanlaoban / EDA_NLP_for_Chinese
View on GitHub
An implement of the paper of EDA for Chinese corpus.中文语料的EDA数据增强工具。NLP数据增强。论文阅读笔记。
☆1,382May 31, 2022Updated 4 years ago
SophonPlus / ChineseNlpCorpus
View on GitHub
搜集、整理、发布中文自然语言处理语料/数据集，与有志之士共同促进中文自然语言处理的发展。
☆6,584Jan 29, 2019Updated 7 years ago
ChineseGLUE / ChineseGLUE
View on GitHub
Language Understanding Evaluation benchmark for Chinese: datasets, baselines, pre-trained models,corpus and leaderboard
☆1,783Feb 18, 2023Updated 3 years ago
beyondacm / Autochecker4Chinese
View on GitHub
中文文本错别字检测以及自动纠错 / Autochecker & autocorrecter for chinese
☆298Sep 16, 2017Updated 8 years ago
liushulinle / PLOME
View on GitHub
Source code for the paper "PLOME: Pre-training with Misspelled Knowledge for Chinese Spelling Correction" in ACL2021
☆241Aug 16, 2022Updated 3 years ago
hankcs / HanLP
View on GitHub
Natural Language Processing for the next decade. Tokenization, Part-of-Speech Tagging, Named Entity Recognition, Syntactic & Semantic Dep…
☆36,451Nov 15, 2025Updated 8 months ago
shibing624 / text2vec
View on GitHub
text2vec, text to vector. 文本向量表征工具，把文本转化为向量矩阵，实现了Word2Vec、RankBM25、Sentence-BERT、CoSENT等文本表征、文本相似度计算模型，开箱即用。
☆4,971Feb 14, 2026Updated 5 months ago