howl-anderson/MicroTokenizer

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/howl-anderson/MicroTokenizer)

howl-anderson / MicroTokenizer

一个轻量且功能全面的中文分词器，帮助学生了解分词器的工作原理。MicroTokenizer: A lightweight Chinese tokenizer designed for educational and research purposes. Provides a practical, hands-on approach to understanding NLP concepts, featuring multiple tokenization algorithms and customizable models. Ideal for students, researchers, and NLP enthusiasts..

☆159

Alternatives and similar repositories for MicroTokenizer

Users that are interested in MicroTokenizer are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

howl-anderson / Chinese_tokenizer_benchmark
View on GitHub
中文分词软件基准测试 | Chinese tokenizer benchmark
☆26Sep 5, 2018Updated 7 years ago
howl-anderson / MicroHMM
View on GitHub
一个微型的基于 Python 的 HMM (隐马尔可夫模型) 包 | A micro python package for HMM (Hidden Markov Model)
☆15Apr 13, 2026Updated 3 months ago
howl-anderson / chinese-wikipedia-corpus-creator
View on GitHub
Corpus creator for Chinese Wikipedia
☆41Jun 30, 2021Updated 5 years ago
howl-anderson / MicroRegEx
View on GitHub
一个微型的正则表达式引擎 | A micro regular expression engine
☆37Sep 22, 2019Updated 6 years ago
howl-anderson / rasa_chinese
View on GitHub
rasa_chinese 专门针对中文语言的 rasa 组件扩展包，提供了许多针对中文语言的组件
☆150May 11, 2023Updated 3 years ago
End-to-end encrypted email - Proton Mail • Ad
Special offer: 40% Off Yearly / 80% Off First Month. All Proton services are open source and independently audited for security.
howl-anderson / Chinese_models_for_SpaCy
View on GitHub
SpaCy 中文模型 | Models for SpaCy that support Chinese
☆673Jan 4, 2025Updated last year
junchenfeng / programmer-commissar-manual
View on GitHub
程序员政治工作手册：如何构建一个团结的程序员团队
☆13May 2, 2019Updated 7 years ago
learncompiler / compiler-lectures
View on GitHub
☆16Jul 21, 2020Updated 6 years ago
taojingcong / ccks2021FEE
View on GitHub
ccks2021事件抽取比赛
☆30Jul 21, 2021Updated 5 years ago
liuhuanyong / SougouWordsCollector
View on GitHub
worddict crawler and transfer for sougpuinput wordict , 搜狗输入法词库抓取与格式转换
☆27Apr 25, 2018Updated 8 years ago
xlturing / machine-learning-journey
View on GitHub
机器学习、自然语言处理、深度学习部分算法实现
☆41Jan 6, 2019Updated 7 years ago
DianboWork / M3T-CNERTA
View on GitHub
☆11Aug 10, 2022Updated 3 years ago
zhanzecheng / Chinese_segment_augment
View on GitHub
python3实现互信息和左右熵的新词发现
☆593Aug 1, 2019Updated 6 years ago
chapzq77 / LDA-SVM
View on GitHub
使用LDA+SVM进行文本的分类
☆22Jul 23, 2017Updated 9 years ago
End-to-end encrypted cloud storage - Proton Drive • Ad
Special offer: 40% Off Yearly / 80% Off First Month. Protect your most important files, photos, and documents from prying eyes.
andy-yangz / easy_bert_pretrain
View on GitHub
The very easy BERT pretrain process by using tokenizers and transformers repos
☆32Feb 27, 2020Updated 6 years ago
howl-anderson / seq2annotation
View on GitHub
基于 TensorFlow & PaddlePaddle 的通用序列标注算法库（目前包含 BiLSTM+CRF, Stacked-BiLSTM+CRF 和 IDCNN+CRF，更多算法正在持续添加中）实现中文分词（Tokenizer / segmentation）、词性标注…
☆85Dec 8, 2022Updated 3 years ago
Aurelius84 / N-gram
View on GitHub
A project of N-gram model comparing FMM/BMM
☆17Oct 17, 2022Updated 3 years ago
yaleimeng / Final_word_Similarity
View on GitHub
综合了同义词词林扩展版与知网（Hownet）的词语相似度计算方法，词汇覆盖更多、结果更准确。
☆744Feb 16, 2022Updated 4 years ago
FudanNLP / fudan_mtl_reviews
View on GitHub
TensorFlow implementation of the paper `Adversarial Multi-task Learning for Text Classification`
☆11Apr 11, 2018Updated 8 years ago
howl-anderson / MITIE_Chinese_Wikipedia_corpus
View on GitHub
Pre-trained Wikipedia corpus by MITIE
☆51Sep 9, 2018Updated 7 years ago
zhanzecheng / The-Art-Of-Programming-By-July
View on GitHub
本项目曾冲到全球第一，干货集锦见本页面最底部，另完整精致的纸质版《编程之法：面试和算法心得》已在京东/当当上销售
☆40Apr 6, 2018Updated 8 years ago
NLP-LOVE / CodingInterviews2-ByPython
View on GitHub
此项目是《剑指offer》第二版里算法面试题的Python3实现版本，作为一本经典书籍，可以时常拿出来看一看、翻一翻、记一记。同时也是为了Python程序员能够更好的通过公司的技术面试，拿到心仪的offer。
☆118Jan 9, 2026Updated 6 months ago
huydx / fulltext_engine
View on GitHub
simple inverted index full text search engine written in python
☆13Oct 3, 2013Updated 12 years ago
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
jingyuanz / protonet-bert-text-classification
View on GitHub
finetune bert for small dataset text classification in a few-shot learning manner using ProtoNet
☆27Nov 25, 2020Updated 5 years ago
thunlp / OpenQA
View on GitHub
The source code of ACL 2018 paper "Denoising Distantly Supervised Open-Domain Question Answering".
☆205Nov 7, 2018Updated 7 years ago
Moonshile / ChineseWordSegmentation
View on GitHub
Chinese word segmentation algorithm without corpus（无需语料库的中文分词）
☆499Sep 3, 2020Updated 5 years ago
rokid / better_jieba
View on GitHub
用结巴(Jieba)轻松实现细粒度分词
☆16Nov 21, 2019Updated 6 years ago
FudanNLP / fastHan
View on GitHub
fastHan是基于fastNLP与pytorch实现的中文自然语言处理工具，像spacy一样调用方便。
☆761Dec 9, 2023Updated 2 years ago
FudanNLP / fastNLP
View on GitHub
fastNLP: A Modularized and Extensible NLP Framework. Currently still in incubation.
☆3,145Jun 5, 2023Updated 3 years ago
thunlp / OpenNRE
View on GitHub
An Open-Source Package for Neural Relation Extraction (NRE)
☆4,470Jan 10, 2024Updated 2 years ago
HITlilingzhi / SMP2017ECDT-DATA
View on GitHub
SMP2017中文人机对话评测数据
☆112Oct 19, 2017Updated 8 years ago
howl-anderson / hanzi_char_featurizer
View on GitHub
汉字字符特征提取器 (featurizer)，提取汉字的特征（发音特征、字形特征）用做深度学习的特征｜ A Chinese character feature extractor, which extracts the features of Chinese charac…
☆301Dec 29, 2025Updated 6 months ago
Managed Database hosting by DigitalOcean • Ad
PostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
HIT-SCIR / pyltp
View on GitHub
pyltp: the python extension for LTP
☆1,541Jul 24, 2022Updated 3 years ago
SongRb / DeepDiveChineseApps
View on GitHub
DeepDive Tutorial with Chinese Support
☆35Oct 3, 2021Updated 4 years ago
crownpku / Rasa_NLU_Chi
View on GitHub
Turn Chinese natural language into structured data 中文自然语言理解
☆1,532Jul 30, 2024Updated last year
howl-anderson / WeatherBot
View on GitHub
一个基于 Rasa 的中文天气情况问询机器人(chatbot), 带 Web UI 界面
☆241Feb 27, 2019Updated 7 years ago
nocater / baidu_nlp_project2
View on GitHub
开课吧&后厂理工学院_百度NLP项目2：试题数据集多标签文本分类 Models: FastText TextCNN GCN BERT et al.
☆48Dec 18, 2019Updated 6 years ago
SchrodingerZhu / VNMCC
View on GitHub
Very Naive MIPS CPU using Clash
☆29Oct 17, 2021Updated 4 years ago
synyi / poplar
View on GitHub
A web-based annotation tool for natural language processing (NLP)
☆528Dec 11, 2022Updated 3 years ago