taishi-i/toiro

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/taishi-i/toiro)

taishi-i / toiro

A tool for comparing tokenizers

☆122

Alternatives and similar repositories for toiro

Users that are interested in toiro are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

himkt / konoha
View on GitHub
🌿 An easy-to-use Japanese Text Processing tool, which makes it possible to switch tokenizers with small changes of code.
☆263Jul 19, 2026Updated last week
himkt / awesome-bert-japanese
View on GitHub
📝 A list of pre-trained BERT models for Japanese with word/subword tokenization + vocabulary construction algorithm information
☆132Mar 15, 2023Updated 3 years ago
megagonlabs / UD_Japanese-GSD
View on GitHub
Japanese data from the Google UDT 2.0.
☆28Mar 24, 2023Updated 3 years ago
megagonlabs / ginza-transformers
View on GitHub
Use custom tokenizers in spacy-transformers
☆16Aug 9, 2022Updated 3 years ago
megagonlabs / bunkai
View on GitHub
Sentence boundary disambiguation tool for Japanese texts (日本語文境界判定器)
☆200Mar 26, 2024Updated 2 years ago
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
megagonlabs / jrte-corpus
View on GitHub
Japanese Realistic Textual Entailment Corpus (NLP 2020, LREC 2020)
☆77Jun 23, 2023Updated 3 years ago
yagays / nayose-wikipedia-ja
View on GitHub
Wikipediaから作成した日本語名寄せデータセット
☆35Mar 10, 2020Updated 6 years ago
WorksApplications / chikkarpy
View on GitHub
Japanese synonym library
☆55Feb 7, 2022Updated 4 years ago
sonoisa / clip-japanese
View on GitHub
日本語CLIPモデル
☆13Sep 15, 2025Updated 10 months ago
megagonlabs / ginza
View on GitHub
A Japanese NLP Library using spaCy as framework based on Universal Dependencies
☆865Jul 10, 2026Updated 2 weeks ago
akirakubo / bert-japanese-aozora
View on GitHub
Japanese BERT trained on Aozora Bunko and Wikipedia, pre-tokenized by MeCab with UniDic & SudachiPy
☆40Aug 8, 2020Updated 5 years ago
chakki-works / Japanese-Company-Lexicon
View on GitHub
☆99Jul 23, 2023Updated 3 years ago
BandaiNamcoResearchInc / DistilBERT-base-jp
View on GitHub
☆161Oct 19, 2020Updated 5 years ago
daac-tools / vaporetto
View on GitHub
🛥 Vaporetto: Very accelerated pointwise prediction based tokenizer
☆297Jul 20, 2026Updated last week
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
taishi-i / nagisa
View on GitHub
A Japanese tokenizer based on recurrent neural networks
☆418Jul 6, 2026Updated 3 weeks ago
WorksApplications / chiVe
View on GitHub
Japanese word embedding with Sudachi and NWJC 🌿
☆177Mar 1, 2024Updated 2 years ago
nobu-g / cohesion-analysis
View on GitHub
Code for COLING 2020 Paper
☆13Feb 3, 2026Updated 5 months ago
ku-nlp / kwja
View on GitHub
An integrated Japanese analyzer based on foundation models
☆145Jul 18, 2026Updated last week
WorksApplications / SudachiPy
View on GitHub
Python version of Sudachi, a Japanese tokenizer.
☆442Oct 7, 2022Updated 3 years ago
polm / fugashi
View on GitHub
A Cython MeCab wrapper for fast, pythonic Japanese tokenization and morphological analysis.
☆533Oct 24, 2025Updated 9 months ago
PKSHATechnology-Research / camphr
View on GitHub
Camphr - NLP libary for creating pipeline components
☆336Dec 9, 2022Updated 3 years ago
KoichiYasuoka / SuPar-UniDic
View on GitHub
Tokenizer POS-tagger Lemmatizer and Dependency-parser for modern and contemporary Japanese with BERT models
☆21Feb 28, 2026Updated 5 months ago
chakki-works / chariot
View on GitHub
Deliver the ready-to-train data to your NLP model.
☆123Jul 15, 2022Updated 4 years ago
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
WorksApplications / SudachiTra
View on GitHub
Japanese tokenizer for Transformers
☆81Dec 15, 2023Updated 2 years ago
taishi-i / nagisa-tutorial-pycon2019
View on GitHub
Code for PyCon JP 2019 talk "Python による日本語自然言語処理〜系列ラベリングによる実世界テキスト分析〜"
☆48Nov 7, 2019Updated 6 years ago
wwwcojp / ja_sentence_segmenter
View on GitHub
japanese sentence segmentation library for python
☆76Updated this week
kajyuuen / funer
View on GitHub
Funer is Rule based Named Entity Recognition tool.
☆22Apr 21, 2022Updated 4 years ago
chemicaltree / tetra
View on GitHub
☆10Sep 14, 2022Updated 3 years ago
takapy0210 / nlplot
View on GitHub
Visualization Module for Natural Language Processing
☆238Sep 21, 2022Updated 3 years ago
singletongue / wikipedia-utils
View on GitHub
Utility scripts for preprocessing Wikipedia texts for NLP
☆78Apr 9, 2024Updated 2 years ago
masayu-a / NAIST-JENE
View on GitHub
☆10Aug 13, 2012Updated 13 years ago
chakki-works / chABSA-dataset
View on GitHub
chakki's Aspect-Based Sentiment Analysis dataset
☆142Feb 25, 2022Updated 4 years ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
upura / weekly-kaggle-news-archive
View on GitHub
https://weeklykagglenews.substack.com
☆24Dec 31, 2022Updated 3 years ago
uribo / bucky
View on GitHub
Helpers for literature management as GitHub actions
☆13May 7, 2021Updated 5 years ago
daac-tools / python-vaporetto
View on GitHub
🛥 Vaporetto is a fast and lightweight pointwise prediction based tokenizer. (Python wrapper)
☆21May 30, 2026Updated last month
vaaaaanquish / nishika_akutagawa_2nd_prize
View on GitHub
nishika akutagawa compedition 2nd prize : https://www.nishika.com/competitions/1/summary
☆25Mar 6, 2020Updated 6 years ago
WorksApplications / Sudachi
View on GitHub
A Japanese Tokenizer for Business
☆990Jul 14, 2026Updated 2 weeks ago
kajyuuen / daaja
View on GitHub
This repository has implementations of data augmentation for NLP for Japanese.
☆64Feb 16, 2023Updated 3 years ago
asakura-data-science / finance
View on GitHub
☆21Feb 28, 2022Updated 4 years ago