一个轻量且功能全面的中文分词器,帮助学生了解分词器的工作原理。MicroTokenizer: A lightweight Chinese tokenizer designed for educational and research purposes. Provides a practical, hands-on approach to understanding NLP concepts, featuring multiple tokenization algorithms and customizable models. Ideal for students, researchers, and NLP enthusiasts..
☆158Oct 18, 2024Updated last year
Alternatives and similar repositories for MicroTokenizer
Users that are interested in MicroTokenizer are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- 一个微型的基于 Python 的 HMM (隐马尔可夫模型) 包 | A micro python package for HMM (Hidden Markov Model)☆15Updated this week
- Corpus creator for Chinese Wikipedia☆41Jun 30, 2021Updated 4 years ago
- 一个微型的正则表达式引擎 | A micro regular expression engine☆37Sep 22, 2019Updated 6 years ago
- rasa_chinese 专门针对中文语言的 rasa 组件扩展包,提供了许多针对中文语言的组件☆151May 11, 2023Updated 2 years ago
- SpaCy 中文模型 | Models for SpaCy that support Chinese☆674Jan 4, 2025Updated last year
- Virtual machines for every use case on DigitalOcean • AdGet dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
- 程序员政治工作手册:如何构建一个团结的程序员团队☆13May 2, 2019Updated 6 years ago
- ☆16Jul 21, 2020Updated 5 years ago
- 金融大脑-金融智能NLP服务 竞赛☆17Apr 27, 2019Updated 6 years ago
- ccks2021事件抽取比赛☆30Jul 21, 2021Updated 4 years ago
- worddict crawler and transfer for sougpuinput wordict , 搜狗输入法词库抓取与格式转换☆26Apr 25, 2018Updated 7 years ago
- ☆11Aug 10, 2022Updated 3 years ago
- python3实现互信息和左右熵的新词发现☆592Aug 1, 2019Updated 6 years ago
- N-grams approximate string matching implementation in pure Python☆26Sep 20, 2010Updated 15 years ago
- The very easy BERT pretrain process by using tokenizers and transformers repos☆32Feb 27, 2020Updated 6 years ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- 使用LDA+SVM进行文本的分类☆22Jul 23, 2017Updated 8 years ago
- 基于 TensorFlow & PaddlePaddle 的通用序列标注算法库(目前包含 BiLSTM+CRF, Stacked-BiLSTM+CRF 和 IDCNN+CRF,更多算法正在持续添加中)实现中文分词(Tokenizer / segmentation)、词性标注…☆85Dec 8, 2022Updated 3 years ago
- A project of N-gram model comparing FMM/BMM☆17Oct 17, 2022Updated 3 years ago
- 综合了同义词词林扩展版与知网(Hownet)的词语相似度计算方法,词汇覆盖更多、结果更准确。☆744Feb 16, 2022Updated 4 years ago
- Google's MediaPipe (v0.8.9) and Python Wheel installer for Jetson Nano (JetPack 4.6) compiled for CUDA 10.2☆16Jun 7, 2023Updated 2 years ago
- Pre-trained Wikipedia corpus by MITIE☆51Sep 9, 2018Updated 7 years ago
- TensorFlow implementation of the paper `Adversarial Multi-task Learning for Text Classification`☆11Apr 11, 2018Updated 8 years ago
- A Query Langauge and System for Python Objects☆26Oct 20, 2017Updated 8 years ago
- 本项目曾冲到全球第一,干货集锦见本页面最底部,另完整精致的纸质版《编程之法:面试和算法心得》已在京东/当当上销售☆40Apr 6, 2018Updated 8 years ago
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- 基于numpy实现的简单神经网络框架☆15Oct 3, 2018Updated 7 years ago
- The source code of ACL 2018 paper "Denoising Distantly Supervised Open-Domain Question Answering".☆205Nov 7, 2018Updated 7 years ago
- 此项目是《剑指offer》第二版里算法面试题的Python3实现版本,作为一本经典书籍,可以时常拿出来看一看、翻一翻、记一记。同时也是为了Python程序员能够更好的通过公司的技术面试,拿到心仪的offer。☆119Jan 9, 2026Updated 3 months ago
- ccks baidu entity link 实体链接 第一名☆841Dec 19, 2023Updated 2 years ago
- 健康饮食,健康生活,小白的健身之路,记录健身知识以及饮食知识。☆11Jun 10, 2018Updated 7 years ago
- finetune bert for small dataset text classification in a few-shot learning manner using ProtoNet☆27Nov 25, 2020Updated 5 years ago
- Chinese word segmentation algorithm without corpus(无需语料库的中文分词)☆500Sep 3, 2020Updated 5 years ago
- fastHan是基于fastNLP与pytorch实现的中文自然语言处理工具,像spacy一样调用方便。☆762Dec 9, 2023Updated 2 years ago
- fastNLP: A Modularized and Extensible NLP Framework. Currently still in incubation.☆3,147Jun 5, 2023Updated 2 years ago
- AI Agents on DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- 用结巴(Jieba)轻松实现细粒度分词☆16Nov 21, 2019Updated 6 years ago
- 汉字字符特征提取器 (featurizer),提取汉字的特征(发音特征、字形特征)用做深度学习的特征 | A Chinese character feature extractor, which extracts the features of Chinese charac…☆298Dec 29, 2025Updated 3 months ago
- An Open-Source Package for Neural Relation Extraction (NRE)☆4,454Jan 10, 2024Updated 2 years ago
- DeepDive Tutorial with Chinese Support☆35Oct 3, 2021Updated 4 years ago
- pyltp: the python extension for LTP☆1,548Jul 24, 2022Updated 3 years ago
- ShadowCoel is a ss/ssr client based on Potatso☆19Jun 25, 2019Updated 6 years ago
- 开课吧&后厂理工学院_百度NLP项目2:试题数据集多标签文本分类 Models: FastText TextCNN GCN BERT et al.☆47Dec 18, 2019Updated 6 years ago