A tool for comparing tokenizers
โ122Nov 9, 2025Updated 6 months ago
Alternatives and similar repositories for toiro
Users that are interested in toiro are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- ๐ฟ An easy-to-use Japanese Text Processing tool, which makes it possible to switch tokenizers with small changes of code.โ261Updated this week
- ๐ A list of pre-trained BERT models for Japanese with word/subword tokenization + vocabulary construction algorithm informationโ132Mar 15, 2023Updated 3 years ago
- Japanese data from the Google UDT 2.0.โ28Mar 24, 2023Updated 3 years ago
- Use custom tokenizers in spacy-transformersโ16Aug 9, 2022Updated 3 years ago
- Sentence boundary disambiguation tool for Japanese texts (ๆฅๆฌ่ชๆๅข็ๅคๅฎๅจ)โ199Mar 26, 2024Updated 2 years ago
- Serverless GPU API endpoints on Runpod - Get Bonus Credits โข AdSkip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
- Japanese Realistic Textual Entailment Corpus (NLP 2020, LREC 2020)โ77Jun 23, 2023Updated 2 years ago
- Wikipediaใใไฝๆใใๆฅๆฌ่ชๅๅฏใใใผใฟใปใใโ35Mar 10, 2020Updated 6 years ago
- Japanese synonym libraryโ55Feb 7, 2022Updated 4 years ago
- ๆฅๆฌ่ชCLIPใขใใซโ13Sep 15, 2025Updated 7 months ago
- A Japanese NLP Library using spaCy as framework based on Universal Dependenciesโ846Mar 30, 2024Updated 2 years ago
- Japanese BERT trained on Aozora Bunko and Wikipedia, pre-tokenized by MeCab with UniDic & SudachiPyโ40Aug 8, 2020Updated 5 years ago
- โ100Jul 23, 2023Updated 2 years ago
- A Japanese tokenizer based on recurrent neural networksโ417Feb 12, 2026Updated 2 months ago
- โ161Oct 19, 2020Updated 5 years ago
- End-to-end encrypted email - Proton Mail โข AdSpecial offer: 40% Off Yearly / 80% Off First Month. All Proton services are open source and independently audited for security.
- Japanese word embedding with Sudachi and NWJC ๐ฟ