The-Orizon / nlputils
Utility scripts or libraries for various Natural Language Processing tasks.
☆39Updated 3 years ago
Alternatives and similar repositories for nlputils:
Users that are interested in nlputils are comparing it to the libraries listed below
- ☆92Updated 4 months ago
- 汉字数据集,包括汉字的相关信息,例如笔画数、部首、拼音、英文释义/同义词等。☆115Updated 4 years ago
- 中文分词软件基准测试 | Chinese tokenizer benchmark☆23Updated 6 years ago
- An open-source classical Chinese information processing toolkit developed by Tsinghua Natural Language Processing Group☆51Updated 6 years ago
- Hanzi Converter for Traditional and Simplified Chinese☆184Updated 5 years ago
- A tool for ancient Chinese segmentation.☆53Updated 5 years ago
- Berserker - BERt chineSE woRd toKenizER☆16Updated 6 years ago
- an open solution for collecting n-gram Chinese lexicon and n-gram statistics☆74Updated 9 years ago
- classic Chinese punctuate experiment with keras using daizhige(殆知阁古代文献藏书) dataset☆34Updated 2 years ago
- NanGe - A Rule-based Chinese-English Machine Translation System☆20Updated 7 years ago
- 一个轻 量且功能全面的中文分词器,帮助学生了解分词器的工作原理。MicroTokenizer: A lightweight Chinese tokenizer designed for educational and research purposes. Provides a…☆150Updated 5 months ago
- Self complemented Word Collocation using MI method which is tested to be effective..基于互信息算法的词语搭配抽取☆28Updated 6 years ago
- Chinese word segmentation module of LTP☆46Updated 9 years ago
- Chinese word segmentation algorithm based on entropy(基于熵,无需语料库的中文分词)☆11Updated 7 years ago
- 古典中文語料庫☆285Updated 2 years ago
- 中文生成式预训练模型☆98Updated 4 years ago
- ☆34Updated 9 months ago
- 中文分词工具评估☆61Updated 2 years ago
- Conceptual Keyboard☆30Updated 2 years ago
- 《现代汉语大词典》字词头☆26Updated 4 years ago
- 古汉语(文言文)字典-爬取文言文字典网,制作Kindle字典.☆66Updated 6 years ago
- Chinese Tokenizer module for Python☆15Updated 6 years ago
- THU Chinese Keyphrase Extraction Toolkit☆125Updated 6 years ago
- Chinese stopwords collection☆134Updated 5 years ago
- 高性能小模型测评 Shared Tasks in NLPCC 2020. Task 1 - Light Pre-Training Chinese Language Model for NLP Task☆58Updated 4 years ago
- The multilingual variant of GLM, a general language model trained with autoregressive blank infilling objective☆62Updated 2 years ago
- 香侬科技(北京香侬慧语科技有限责任公司)知乎爆料备份☆41Updated 4 years ago
- 漢語拼音轉換表☆39Updated 4 years ago
- This is a corpus of Chinese abbreviation, including negative full forms.☆194Updated 3 years ago
- 下载搜狗、百度、QQ输入法的词库文件的 python 爬虫,可用于构建不同行业的词汇库☆113Updated 7 years ago