๐ฅ Vaporetto: Very accelerated pointwise prediction based tokenizer
โ252Feb 7, 2026Updated last month
Alternatives and similar repositories for vaporetto
Users that are interested in vaporetto are comparing it to the libraries listed below
Sorting:
- ๐ค vibrato: Viterbi-based accelerated tokenizerโ399Feb 7, 2026Updated last month
- Japanese tokenizer for Transformersโ79Dec 15, 2023Updated 2 years ago
- A multilingual morphological analysis library.โ606Feb 27, 2026Updated last week
- ๅ จๅฝๆธ่ชใใผใฟใใไฝๆใใๆฏใไปฎๅใฎใใผใฟใปใใโ28Sep 21, 2021Updated 4 years ago
- Sentence boundary disambiguation tool for Japanese texts (ๆฅๆฌ่ชๆๅข็ๅคๅฎๅจ)โ199Mar 26, 2024Updated last year
- Sudachi in Rust ๐ฆ and new generation of SudachiPyโ428Updated this week
- This repository has implementations of data augmentation for NLP for Japanese.โ64Feb 16, 2023Updated 3 years ago
- ๐ฅ Vaporetto is a fast and lightweight pointwise prediction based tokenizer. This is a Python wrapper for Vaporetto.โ21Jun 1, 2025Updated 9 months ago
- ๐ A fast implementation of the Aho-Corasick algorithm using the compact double-array data structure in Rust.โ244Jan 26, 2026Updated last month
- A library for semantic similarity searchโ26Jan 31, 2025Updated last year
- ๐ฆ Rust library of natural language dictionaries using character-wise double-array tries.โ37Jan 13, 2025Updated last year
- Wikipediaใ็จใใๆฅๆฌ่ชใฎๅบๆ่กจ็พๆฝๅบใใผใฟใปใใโ142Sep 2, 2023Updated 2 years ago
- Japanese synonym libraryโ55Feb 7, 2022Updated 4 years ago
- Disambiguate japanese heteronymsโ32Oct 3, 2023Updated 2 years ago
- The robust text processing pipeline framework enabling customizable, efficient, and metric-logged text preprocessing.โ125Nov 13, 2025Updated 3 months ago
- An integrated Japanese analyzer based on foundation modelsโ138Mar 2, 2026Updated last week
- ๐ฟ An easy-to-use Japanese Text Processing tool, which makes it possible to switch tokenizers with small changes of code.โ261Mar 1, 2026Updated last week
- A Japanese NLP Library using spaCy as framework based on Universal Dependenciesโ834Mar 30, 2024Updated last year
- JGLUE: Japanese General Language Understanding Evaluationโ337Mar 31, 2025Updated 11 months ago
- A tool for comparing tokenizersโ121Nov 9, 2025Updated 4 months ago
- A tool for visualizing the internal structures of morphological analyzer Sudachiโ18Jun 9, 2022Updated 3 years ago
- โ51Sep 11, 2023Updated 2 years ago
- ๐ A fast implementation of the Aho-Corasick algorithm using the compact double-array data structure. (Python wrapper for daachorse)โ20Mar 15, 2025Updated 11 months ago
- Pytorch implementation and pre-trained Japanese model for CANINE, the efficient character-level transformer.โ89Nov 3, 2023Updated 2 years ago
- A Japanese Tokenizer for Businessโ942Jun 17, 2025Updated 8 months ago
- Japanese word embedding with Sudachi and NWJC ๐ฟโ171Mar 1, 2024Updated 2 years ago
- ๐ A list of pre-trained BERT models for Japanese with word/subword tokenization + vocabulary construction algorithm informationโ131Mar 15, 2023Updated 2 years ago
- Finding all pairs of similar documents time- and memory-efficientlyโ62Mar 13, 2025Updated 11 months ago
- Evidence-based Explanation Dataset (AACL-IJCNLP 2020)โ18Dec 17, 2020Updated 5 years ago
- Are you SATySFi-ed with Nix?โ14Mar 6, 2023Updated 3 years ago
- โ24Jan 27, 2025Updated last year
- Japanese Realistic Textual Entailment Corpus (NLP 2020, LREC 2020)โ77Jun 23, 2023Updated 2 years ago
- Darts-clone python bindingโ20Apr 23, 2022Updated 3 years ago
- Japanese data from the Google UDT 2.0.โ28Mar 24, 2023Updated 2 years ago
- ่ช็ถ่จ่ชใงๆธใใใๆ้ๆ ๅ ฑ่กจ็พใๆฝๅบ/่ฆๆ ผๅใใใซใผใซใใผในใฎ่งฃๆๅจโ140Feb 27, 2025Updated last year
- ๆฌ่ชๅคๆใฟในใฏใซใใใ่ฉไพก็จใใผใฟใปใใโ21Nov 24, 2022Updated 3 years ago
- Rust implementation of SIF and uSIF: Simple and fast sentence embeddingโ19Jan 22, 2025Updated last year
- A Japanese tokenizer based on recurrent neural networksโ412Feb 12, 2026Updated 3 weeks ago
- Japanese Morphological Analysis written in Rustโ83Dec 30, 2021Updated 4 years ago