一个轻量且功能全面的中文分词器,帮助学生了解分词器的工作原理。MicroTokenizer: A lightweight Chinese tokenizer designed for educational and research purposes. Provides a practical, hands-on approach to understanding NLP concepts, featuring multiple tokenization algorithms and customizable models. Ideal for students, researchers, and NLP enthusiasts..
☆158Oct 18, 2024Updated last year
Alternatives and similar repositories for MicroTokenizer
Users that are interested in MicroTokenizer are comparing it to the libraries listed below
Sorting:
- 中文分词软件基准测试 | Chinese tokenizer benchmark☆25Sep 5, 2018Updated 7 years ago
- 一个微型的基于 Python 的 HMM (隐马尔可夫模型) 包 | A micro python package for HMM (Hidden Markov Model)☆15Jan 15, 2020Updated 6 years ago
- Corpus creator for Chinese Wikipedia☆41Jun 30, 2021Updated 4 years ago
- rasa_chinese 专门针对中文语言的 rasa 组件扩展包,提供了许多针对中文语言的组件☆151May 11, 2023Updated 2 years ago
- 金融大脑-金融智能NLP服务 竞赛☆17Apr 27, 2019Updated 6 years ago
- A project of N-gram model comparing FMM/BMM☆17Oct 17, 2022Updated 3 years ago
- SpaCy 中文模型 | Models for SpaCy that support Chinese☆673Jan 4, 2025Updated last year
- worddict crawler and transfer for sougpuinput wordict , 搜狗输入法词库抓取与格式转换☆26Apr 25, 2018Updated 7 years ago
- 一个微型的正则表达式引擎 | A micro regular expression engine☆37Sep 22, 2019Updated 6 years ago
- Fine-tuning Quantized Neural Networks with Zeroth-order Optimization☆16Sep 17, 2025Updated 5 months ago
- python3实现互信息和左右熵的新词发现☆593Aug 1, 2019Updated 6 years ago
- ☆13May 19, 2023Updated 2 years ago
- 程序员政治工作手册:如何构建一个团结的程序员团队☆13May 2, 2019Updated 6 years ago
- 综合了同义词词林扩展版与知网(Hownet)的词语相似度计算方法,词汇覆盖更多、结果更准确。☆744Feb 16, 2022Updated 4 years ago
- TensorFlow implementation of the paper `Adversarial Multi-task Learning for Text Classification`☆11Apr 11, 2018Updated 7 years ago
- ☆13Dec 3, 2017Updated 8 years ago
- 此项目是《剑指offer》第二版里算法面试题的Python3实现版本,作为一本经典书籍,可以时常拿出来看一看、翻一翻、记一记。同时也是为了Python程序员能够更好的通过公司的技术面试,拿到心仪的offer。☆119Jan 9, 2026Updated 2 months ago
- The very easy BERT pretrain process by using tokenizers and transformers repos☆32Feb 27, 2020Updated 6 years ago
- 基于 TensorFlow & PaddlePaddle 的通用序列标注算法库(目前包含 BiLSTM+CRF, Stacked-BiLSTM+CRF 和 IDCNN+CRF,更多算法正在持续添加中)实现中文分词(Tokenizer / segmentation)、词性标注…☆86Dec 8, 2022Updated 3 years ago
- 整理一下在keras中使用T5模型的要点☆174Mar 4, 2022Updated 4 years ago
- N-grams approximate string matching implementation in pure Python☆26Sep 20, 2010Updated 15 years ago
- A Query Langauge and System for Python Objects☆26Oct 20, 2017Updated 8 years ago
- The crawler for data on web of science, especially focus on the analysis of citation data☆16Dec 14, 2018Updated 7 years ago
- Code for Unsupervised multi-granular Chinese word segmentation and term discovery via graph partition [JBI]☆16Jan 28, 2022Updated 4 years ago
- 汉字字符特征提取器 (featurizer),提取汉字的特征(发音特征、字形特征)用做深度学习的特征 | A Chinese character feature extractor, which extracts the features of Chinese charac…☆299Dec 29, 2025Updated 2 months ago
- DeepDive Tutorial with Chinese Support☆35Oct 3, 2021Updated 4 years ago
- ccks2021事件抽取比赛☆30Jul 21, 2021Updated 4 years ago
- ☆16Jul 21, 2020Updated 5 years ago
- simple inverted index full text search engine written in python☆13Oct 3, 2013Updated 12 years ago
- A publishing website of a table collecting meta-learning-related papers in the area of human language processing.☆17Aug 2, 2021Updated 4 years ago
- Easy Data Augmentation for NLP on Chinese☆16Aug 3, 2019Updated 6 years ago
- ccks baidu entity link 实体链接 第一名☆842Dec 19, 2023Updated 2 years ago
- 自动构建中文词库:http://www.matrix67.com/blog/archives/5044☆655Dec 5, 2023Updated 2 years ago
- Docker image for WebProtégé☆17Jan 26, 2019Updated 7 years ago
- 基于numpy实现的简单神经网络框架☆15Oct 3, 2018Updated 7 years ago
- A web-based annotation tool for natural language processing (NLP)☆530Dec 11, 2022Updated 3 years ago
- 新词发现算法(NewWordDetection)☆63Sep 4, 2017Updated 8 years ago
- Chinese word segmentation algorithm without corpus(无需语料库的中文分词)☆500Sep 3, 2020Updated 5 years ago
- Chinese Named Entity Recognition with IDCNN/biLSTM+CRF, and Relation Extraction with biGRU+2ATT 中文实体识别与关系提取☆2,265Feb 1, 2024Updated 2 years ago