一个轻量且功能全面的中文分词器,帮助学生了解分词器的工作原理。MicroTokenizer: A lightweight Chinese tokenizer designed for educational and research purposes. Provides a practical, hands-on approach to understanding NLP concepts, featuring multiple tokenization algorithms and customizable models. Ideal for students, researchers, and NLP enthusiasts..
☆158Oct 18, 2024Updated last year
Alternatives and similar repositories for MicroTokenizer
Users that are interested in MicroTokenizer are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- 一个微型的基于 Python 的 HMM (隐马尔可夫模型) 包 | A micro python package for HMM (Hidden Markov Model)☆15Jan 15, 2020Updated 6 years ago
- Corpus creator for Chinese Wikipedia☆41Jun 30, 2021Updated 4 years ago
- 一个微型的正则表达式引擎 | A micro regular expression engine☆37Sep 22, 2019Updated 6 years ago
- rasa_chinese 专门针对中文语言的 rasa 组件扩展包,提供了许多针对中文语言的组件☆151May 11, 2023Updated 2 years ago
- SpaCy 中文模型 | Models for SpaCy that support Chinese☆674Jan 4, 2025Updated last year
- Wordpress hosting with auto-scaling on Cloudways • AdFully Managed hosting built for WordPress-powered businesses that need reliable, auto-scalable hosting. Cloudways SafeUpdates now available.
- 程序员政治工作手册:如何构建一个团结的程序员团队☆13May 2, 2019Updated 6 years ago
- ☆16Jul 21, 2020Updated 5 years ago
- 金融大脑-金融智能NLP服务 竞赛☆17Apr 27, 2019Updated 6 years ago
- ccks2021事件抽取比赛☆30Jul 21, 2021Updated 4 years ago
- worddict crawler and transfer for sougpuinput wordict , 搜狗输入法词库抓取与格式转换☆26Apr 25, 2018Updated 7 years ago
- ☆11Aug 10, 2022Updated 3 years ago
- python3实现互信息和左右熵的新词发现☆592Aug 1, 2019Updated 6 years ago
- The very easy BERT pretrain process by using tokenizers and transformers repos