daac-tools / vaporettoView external linksLinks
๐ฅ Vaporetto: Very accelerated pointwise prediction based tokenizer
โ251Feb 7, 2026Updated last week
Alternatives and similar repositories for vaporetto
Users that are interested in vaporetto are comparing it to the libraries listed below
Sorting:
- ๐ค vibrato: Viterbi-based accelerated tokenizerโ398Feb 7, 2026Updated last week
- Japanese tokenizer for Transformersโ79Dec 15, 2023Updated 2 years ago
- ๅ จๅฝๆธ่ชใใผใฟใใไฝๆใใๆฏใไปฎๅใฎใใผใฟใปใใโ28Sep 21, 2021Updated 4 years ago
- A multilingual morphological analysis library.โ601Updated this week
- Sentence boundary disambiguation tool for Japanese texts (ๆฅๆฌ่ชๆๅข็ๅคๅฎๅจ)โ199Mar 26, 2024Updated last year
- Sudachi in Rust ๐ฆ and new generation of SudachiPyโ425Jan 7, 2026Updated last month
- This repository has implementations of data augmentation for NLP for Japanese.โ64Feb 16, 2023Updated 3 years ago
- ๐ A fast implementation of the Aho-Corasick algorithm using the compact double-array data structure in Rust.โ242Jan 26, 2026Updated 3 weeks ago
- ๐ฅ Vaporetto is a fast and lightweight pointwise prediction based tokenizer. This is a Python wrapper for Vaporetto.โ21Jun 1, 2025Updated 8 months ago
- ๐ฆ Rust library of natural language dictionaries using character-wise double-array tries.โ36Jan 13, 2025Updated last year
- A library for semantic similarity searchโ26Jan 31, 2025Updated last year
- Wikipediaใ็จใใๆฅๆฌ่ชใฎๅบๆ่กจ็พๆฝๅบใใผใฟใปใใโ142Sep 2, 2023Updated 2 years ago
- Japanese synonym libraryโ55Feb 7, 2022Updated 4 years ago
- Disambiguate japanese heteronymsโ32Oct 3, 2023Updated 2 years ago
- The robust text processing pipeline framework enabling customizable, efficient, and metric-logged text preprocessing.โ125Nov 13, 2025Updated 3 months ago
- An integrated Japanese analyzer based on foundation modelsโ138Feb 2, 2026Updated 2 weeks ago
- ๐ฟ An easy-to-use Japanese Text Processing tool, which makes it possible to switch tokenizers with small changes of code.โ261Apr 29, 2025Updated 9 months ago
- A Japanese NLP Library using spaCy as framework based on Universal Dependenciesโ832Mar 30, 2024Updated last year
- JGLUE: Japanese General Language Understanding Evaluationโ333Mar 31, 2025Updated 10 months ago
- A tool for comparing tokenizersโ121Nov 9, 2025Updated 3 months ago
- A tool for visualizing the internal structures of morphological analyzer Sudachiโ18Jun 9, 2022Updated 3 years ago
- โ51Sep 11, 2023Updated 2 years ago
- ๐ A fast implementation of the Aho-Corasick algorithm using the compact double-array data structure. (Python wrapper for daachorse)โ20Mar 15, 2025Updated 11 months ago
- Pytorch implementation and pre-trained Japanese model for CANINE, the efficient character-level transformer.โ89Nov 3, 2023Updated 2 years ago
- A Japanese Tokenizer for Businessโ931Jun 17, 2025Updated 8 months ago
- Japanese word embedding with Sudachi and NWJC ๐ฟโ169Mar 1, 2024Updated last year
- ๐ A list of pre-trained BERT models for Japanese with word/subword tokenization + vocabulary construction algorithm informationโ131Mar 15, 2023Updated 2 years ago
- Finding all pairs of similar documents time- and memory-efficientlyโ62Mar 13, 2025Updated 11 months ago
- Evidence-based Explanation Dataset (AACL-IJCNLP 2020)โ18Dec 17, 2020Updated 5 years ago
- Are you SATySFi-ed with Nix?โ14Mar 6, 2023Updated 2 years ago
- โ24Jan 27, 2025Updated last year
- Japanese Realistic Textual Entailment Corpus (NLP 2020, LREC 2020)โ77Jun 23, 2023Updated 2 years ago
- Japanese data from the Google UDT 2.0.โ28Mar 24, 2023Updated 2 years ago
- Darts-clone python bindingโ20Apr 23, 2022Updated 3 years ago
- ่ช็ถ่จ่ชใงๆธใใใๆ้ๆ ๅ ฑ่กจ็พใๆฝๅบ/่ฆๆ ผๅใใใซใผใซใใผในใฎ่งฃๆๅจโ140Feb 27, 2025Updated 11 months ago
- Rust implementation of SIF and uSIF: Simple and fast sentence embeddingโ19Jan 22, 2025Updated last year
- ๆฌ่ชๅคๆใฟในใฏใซใใใ่ฉไพก็จใใผใฟใปใใโ21Nov 24, 2022Updated 3 years ago
- Japanese Morphological Analysis written in Rustโ82Dec 30, 2021Updated 4 years ago
- japanese sentence segmentation library for pythonโ73Apr 3, 2023Updated 2 years ago