The-Orizon / nlputilsLinks

Utility scripts or libraries for various Natural Language Processing tasks.

☆38

Alternatives and similar repositories for nlputils

Users that are interested in nlputils are comparing it to the libraries listed below

Sorting:

berniey / hanziconv
Hanzi Converter for Traditional and Simplified Chinese
☆190Updated 5 years ago
zhangyics / Chinese-abbreviation-dataset
This is a corpus of Chinese abbreviation, including negative full forms.
☆197Updated 4 years ago
panyang / MeCab-Chinese
Chinese morphological analysis with Word Segment and POS Tagging data for MeCab
☆162Updated 8 years ago
shenshen-hungry / Ancient-Chinese-Segmentation
A tool for ancient Chinese segmentation.
☆54Updated 6 years ago
tsroten / zhon
Constants used in Chinese text processing
☆377Updated 10 months ago
howl-anderson / MicroTokenizer
一个轻量且功能全面的中文分词器，帮助学生了解分词器的工作原理。MicroTokenizer: A lightweight Chinese tokenizer designed for educational and research purposes. Provides a…
☆157Updated last year
UniversalDependencies / UD_Chinese-GSD
☆96Updated last month
stopwords-iso / stopwords-zh
Chinese stopwords collection
☆138Updated 5 years ago
sunpinyin / open-gram
an open solution for collecting n-gram Chinese lexicon and n-gram statistics
☆73Updated 9 years ago
liuhuanyong / ChineseAntiword
chinese anti semantic word search interface based on dict crawled from online resources, ChineseAntiword,针对中文词语的反义词查询接口
☆59Updated 7 years ago
crownpku / Somiao-Pinyin
Somiao Pinyin: Train your own Chinese Input Method with Seq2seq Model 搜喵拼音输入法
☆272Updated 5 years ago
bojone / chinese-gen
中文生成式预训练模型
☆99Updated 5 years ago
ml-distribution / chinese-corpus
中文相关词典和语料库。
☆175Updated 11 years ago
hankcs / multi-criteria-cws
Simple Solution for Multi-Criteria Chinese Word Segmentation
☆303Updated 5 years ago
howl-anderson / Chinese_tokenizer_benchmark
中文分词软件基准测试 | Chinese tokenizer benchmark
☆25Updated 7 years ago
kite1988 / nus-sms-corpus
☆128Updated 7 years ago
THUNLP-AIPoet / StylisticPoetry
Codes for Stylistic Chinese Poetry Generation via Unsupervised Style Disentanglement (EMNLP 2018)
☆195Updated 5 years ago
WuLC / ThesaurusSpider
下载搜狗、百度、QQ输入法的词库文件的 python 爬虫，可用于构建不同行业的词汇库
☆116Updated 8 years ago
ymcui / Chinese-Cloze-RC
A Chinese Cloze-style RC Dataset: People's Daily & Children's Fairy Tale (CFT)
☆173Updated 6 years ago
love1life / cqa
☆67Updated 8 years ago
AiningWang / Chinese-Words-Segmentation
Chinese word segmentation algorithm based on entropy（基于熵，无需语料库的中文分词）
☆11Updated 7 years ago
HIT-SCIR / ltp-cws
Chinese word segmentation module of LTP
☆46Updated 10 years ago
BrikerMan / classic_chinese_punctuate
classic Chinese punctuate experiment with keras using daizhige(殆知阁古代文献藏书) dataset
☆35Updated 2 years ago
gamelife1314 / scel2txt
转换搜狗拼音词库为txt文件
☆50Updated 8 years ago
ling0322 / webdict
一个中文词库
☆346Updated 11 years ago
howl-anderson / chinese-wikipedia-corpus-creator
Corpus creator for Chinese Wikipedia
☆41Updated 4 years ago
thunlp / THUCKE
THU Chinese Keyphrase Extraction Toolkit
☆124Updated 7 years ago
wainshine / Book-Names-Corpus
图书名语料库。含部分电影、游戏名称。
☆72Updated last year
jannson / wordmaker
auto generate chinese words in huge text.
☆92Updated 10 years ago
liuhuanyong / ChineseCixing
WordForm,针对中文词语的笔画拆解，偏旁查询，拼音转换接口
☆65Updated 7 years ago