byronhe / cppjieba
"结巴"中文分词的C++版本,使用 darts Double Array Trie 降低内存占用到 1/100
☆50Updated 2 years ago
Alternatives and similar repositories for cppjieba
Users that are interested in cppjieba are comparing it to the libraries listed below
Sorting:
- transformer tokenizers (e.g. BERT tokenizer) in C++ (WIP)☆17Updated 3 years ago
- A clone of Darts (Double-ARray Trie System)☆147Updated this week
- CppJieba的C语言api☆57Updated 2 years ago
- 高性能文本 Tokenizer 库☆28Updated last year
- KSAI Lite is a deep learning inference framework of kingsoft, based on tensorflow lite☆95Updated 2 years ago
- BERT Tokenizer in C++☆76Updated 4 years ago
- [本项目不再维护] 将汉字转换为拼音, 支持多音字,拼音 -> pin yin☆210Updated last week
- An Efficient Lexical Analyzer for Chinese☆42Updated 5 years ago
- 词语拼音数据☆481Updated last month
- C++ model train&inference framework☆224Updated 5 years ago
- mmseg 分词算法c++实现☆33Updated 9 years ago
- 从Kaldi中裁剪的轻量级语音识别解码推理框架,目前实现了MFCC+GMM+Viterbi,不依赖OpenFST、OpenBLAS等库☆21Updated 3 years ago
- 一个中文的已标注词性的语料库☆203Updated 10 years ago
- Port of Funasr's Paraformer model in C/C++☆31Updated 10 months ago
- a Chinese tokenizer☆17Updated 11 years ago
- ☆125Updated 4 years ago
- C++ headers(hpp) library with Python style.☆132Updated last month
- The simple header file library of CppJieba☆41Updated 9 years ago
- A python module that convert chinese written string to read string. 一个python包:将中文书面字符串转换为口语字符串。☆120Updated 5 years ago
- DaCiDian is an open-sourced chinese mandarin lexicon for automatic speech recognition(ASR)☆302Updated 4 years ago
- Somiao Pinyin: Train your own Chinese Input Method with Seq2seq Model 搜喵拼音输入法☆267Updated 5 years ago
- 大规模中文语料☆41Updated 5 years ago
- python | 高效使用统计语言模型kenlm:新词发现、分词、智能纠错等☆164Updated 5 years ago
- NLP的一些公开资料,有些是别人原始分享的,有些是处理了一下。☆57Updated 9 years ago
- 汉字转拼音占内存更少转换速度更快☆39Updated 9 years ago
- Onnxruntime Builder☆50Updated last week
- 这个工程的目的是从视频中获取语音识别的训练数据,用于训练字幕自动生成☆53Updated 6 years ago
- simple-pinyin 基于隐马尔可夫模型的简易拼音输入法(拼音转汉字)☆46Updated last month
- 各大中文分词性能评测☆157Updated 6 years ago
- Self complemented Pinyin2Chinese demo use algorithms including Trie and HMM model , 基于隐马尔科夫模型与Trie树的拼音切分与拼音转中文的简单demo实现。☆86Updated 7 years ago