mingruimingrui/ICU-tokenizer

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/mingruimingrui/ICU-tokenizer)

mingruimingrui / ICU-tokenizer

ICU based universal language tokenizer

☆34

Alternatives and similar repositories for ICU-tokenizer

Users that are interested in ICU-tokenizer are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

mrpeerat / CL-ReLKT
View on GitHub
The implementation of CL-ReLKT (NAACL-2022)
☆14Aug 31, 2022Updated 3 years ago
takuti / datadog-anomaly-detector
View on GitHub
Anomaly detection system for Datadog multiple metrics
☆23Nov 11, 2016Updated 9 years ago
wooorm / trigrams
View on GitHub
Trigram files for 500+ languages
☆24Mar 21, 2025Updated last year
ropensci / binman
View on GitHub
A Binary Download Manager
☆16Jul 25, 2023Updated 3 years ago
Jana-Z / german-sentiment-lexicon
View on GitHub
A German lexicon with words assosiated with love, fear, joy, disgust, surprise, contempt and anger
☆12Nov 15, 2023Updated 2 years ago
Managed Database hosting by DigitalOcean • Ad
PostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
openeventdata / Dictionaries
View on GitHub
PETRARCH actor, agent and verb dictionaries
☆22Aug 3, 2018Updated 7 years ago
soramame0518 / j-mfd
View on GitHub
Japanese Moral Foundations Dictionary (J-MFD)
☆17Jan 12, 2022Updated 4 years ago
mingruimingrui / fast-mosestokenizer
View on GitHub
c++ mosestokenizer
☆18Mar 13, 2024Updated 2 years ago
Hamza5 / multilevel-diacritizer
View on GitHub
Extensible DL-based automatic Arabic diacritization tool allowing the restoration of different types of diacritics.
☆24Jul 25, 2023Updated 3 years ago
koheiw / wordvector
View on GitHub
Train word and document vectors using quanteda
☆16Updated this week
cookielee77 / RankGan-NIPS2017
View on GitHub
Tensorflow implementation of RankGan (Adversarial Ranking for Language Generation)
☆22Jun 15, 2018Updated 8 years ago
HillZhang1999 / RobustGEC
View on GitHub
Code & Data for our Paper "RobustGEC: Robust Grammatical Error Correction Against Subtle Context Perturbation" (EMNLP 2023)
☆17Jan 23, 2024Updated 2 years ago
r-lib / sparsevctrs
View on GitHub
Sparse vector class using ALTREP
☆26Jan 27, 2026Updated 5 months ago
bltlab / mot
View on GitHub
Multilingual Open Text
☆26May 8, 2025Updated last year
Managed Database hosting by DigitalOcean • Ad
PostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
trueto / ERNIE2Torch
View on GitHub
a script from ERNIE1.0 or ERNIE2.0 to transfomers' BERT format
☆10Mar 28, 2020Updated 6 years ago
yanzhangnlp / BSL
View on GitHub
Bootstrapped Unsupervised Sentence Representation Learning (ACL 2021)
☆30Apr 27, 2022Updated 4 years ago
ChristopherLucas / translateR
View on GitHub
R Package for Cross-Language Topic Modeling
☆21Oct 25, 2022Updated 3 years ago
ufal / low-resource-gec-wnut2019
View on GitHub
Source code for paper Grammatical Error Correction in Low-Resource Scenarios (W-NUT 2019)
☆13Jun 21, 2022Updated 4 years ago
DinLei / DoubleArrayTrie
View on GitHub
双端trie树的python实现
☆11Jul 23, 2018Updated 8 years ago
yu961549745 / VSCodeHighlightForMaple
View on GitHub
Maple Highlight files for Visual Studio Code
☆12Mar 15, 2019Updated 7 years ago
Yifan-Gao / open_retrieval_conversational_machine_reading
View on GitHub
Open-Retrieval Conversational Machine Reading: A new setting & OR-ShARC dataset
☆13Nov 19, 2022Updated 3 years ago
bernhard-42 / ssh_ipykernel
View on GitHub
A remote jupyter kernel via ssh
☆19Sep 8, 2023Updated 2 years ago
koheiw / proxyC
View on GitHub
R package for large-scale similarity/distance computation
☆30Feb 27, 2026Updated 4 months ago
Proton VPN Special Offer - Get 70% off • Ad
Special partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
zseder / hundict
View on GitHub
bilingual dictionary extractor from parallel corpora
☆24Jul 3, 2014Updated 12 years ago
songliuchen / echarts
View on GitHub
解决echarts graph节点之间不能添加多边问题，节点自身边展示
☆14Dec 6, 2024Updated last year
lemuria-wchen / DxFormer
View on GitHub
Code for our Bioinformatics 2022 paper: "DxFormer: A Decoupled Automatic Diagnostic System Based on Decoder-Encoder Transformer with Dens…
☆11Dec 24, 2022Updated 3 years ago
Datastory-CN / ASQP-Datasets
View on GitHub
☆16Aug 23, 2023Updated 2 years ago
momo-journey / mbart-chinese
View on GitHub
多语言降噪预训练模型MBart的中文生成任务
☆11May 27, 2021Updated 5 years ago
oscar-project / ungoliant
View on GitHub
The pipeline for the OSCAR corpus
☆178Nov 9, 2025Updated 8 months ago
adrianeboyd / boyd-wnut2018
View on GitHub
Code and data for: Low Resource Grammatical Error Correction Using Wikipedia Edits (WNUT 2018)
☆17Jul 16, 2024Updated 2 years ago
ratthachat / prompt_engineering
View on GitHub
Prompt Engineering
☆15Aug 31, 2021Updated 4 years ago
graehl / carmel
View on GitHub
finite-state toolkit, EM and Bayesian (Gibbs sampling) training for FST and context-free derivation forests
☆41Oct 14, 2022Updated 3 years ago
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
Helsinki-NLP / OpusFilter
View on GitHub
OpusFilter - Parallel corpus processing toolkit
☆115Jul 1, 2026Updated 3 weeks ago
kbatsuren / wiktra
View on GitHub
Wiktra - Python tool of Wiktionary Transliteration modules for 514 languages and its 102 different scripts (orthographies)
☆37Jun 29, 2025Updated last year
codertimo / python-template
View on GitHub
python project template for personal projects! 🙋‍♀️
☆11Nov 28, 2020Updated 5 years ago
kb-labb / kb_bart
View on GitHub
Pretraining scripts for BART transformer model
☆12May 15, 2023Updated 3 years ago
UKPLab / emnlp2021-prompt-ft-heuristics
View on GitHub
☆10Sep 27, 2021Updated 4 years ago
GoFigure-LANL / VisHash
View on GitHub
Visual Hash for matching copies of visually similar images.
☆16Mar 17, 2025Updated last year
Jacob-Zhou / gecdi
View on GitHub
The repo of "Improving Seq2Seq Grammatical Error Correction via Decoding Interventions"
☆32Jan 22, 2024Updated 2 years ago