Japanese tokenizer for Transformers
☆79Dec 15, 2023Updated 2 years ago
Alternatives and similar repositories for SudachiTra
Users that are interested in SudachiTra are comparing it to the libraries listed below
Sorting:
- Pytorch implementation and pre-trained Japanese model for CANINE, the efficient character-level transformer.☆89Nov 3, 2023Updated 2 years ago
- ☆24Jan 27, 2025Updated last year
- Japanese data from the Google UDT 2.0.☆28Mar 24, 2023Updated 2 years ago
- Japanese word embedding with Sudachi and NWJC 🌿☆170Mar 1, 2024Updated 2 years ago
- Japanese synonym library☆55Feb 7, 2022Updated 4 years ago
- Use custom tokenizers in spacy-transformers☆16Aug 9, 2022Updated 3 years ago
- 📝 A list of pre-trained BERT models for Japanese with word/subword tokenization + vocabulary construction algorithm information☆131Mar 15, 2023Updated 2 years ago
- Wikipediaを用いた日本語の固有表現抽出データセット☆142Sep 2, 2023Updated 2 years ago
- JGLUE: Japanese General Language Understanding Evaluation☆335Mar 31, 2025Updated 11 months ago
- 🛥 Vaporetto: Very accelerated pointwise prediction based tokenizer☆252Feb 7, 2026Updated 3 weeks ago
- Japanese Realistic Textual Entailment Corpus (NLP 2020, LREC 2020)☆77Jun 23, 2023Updated 2 years ago
- Sentence boundary disambiguation tool for Japanese texts (日本語文境界判定器)☆199Mar 26, 2024Updated last year
- Pre-training Language Models for Japanese☆50Jul 2, 2023Updated 2 years ago
- A Japanese NLP Library using spaCy as framework based on Universal Dependencies