๐ฅ Vaporetto: Very accelerated pointwise prediction based tokenizer
โ255Feb 7, 2026Updated 2 months ago
Alternatives and similar repositories for vaporetto
Users that are interested in vaporetto are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- ๐ค vibrato: Viterbi-based accelerated tokenizerโ405Feb 7, 2026Updated 2 months ago
- Japanese tokenizer for Transformersโ79Dec 15, 2023Updated 2 years ago
- ๐ฅ Vaporetto is a fast and lightweight pointwise prediction based tokenizer. This is a Python wrapper for Vaporetto.โ21Jun 1, 2025Updated 10 months ago
- ๐ A fast implementation of the Aho-Corasick algorithm using the compact double-array data structure in Rust.โ249Updated this week
- Sentence boundary disambiguation tool for Japanese texts (ๆฅๆฌ่ชๆๅข็ๅคๅฎๅจ)โ199Mar 26, 2024Updated 2 years ago
- DigitalOcean Gradient AI Platform โข AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- Sudachi in Rust ๐ฆ and new generation of SudachiPyโ436Apr 2, 2026Updated last week
- A multilingual morphological analysis library.โ610Updated this week
- A library for semantic similarity searchโ26Jan 31, 2025Updated last year
- ๅ จๅฝๆธ่ชใใผใฟใใไฝๆใใๆฏใไปฎๅใฎใใผใฟใปใใโ31Sep 21, 2021Updated 4 years ago
- This repository has implementations of data augmentation for NLP for Japanese.โ64Feb 16, 2023Updated 3 years ago
- An integrated Japanese analyzer based on foundation modelsโ141Apr 1, 2026Updated last week
- Wikipediaใ็จใใๆฅๆฌ่ชใฎๅบๆ่กจ็พๆฝๅบใใผใฟใปใใโ142Sep 2, 2023Updated 2 years ago
- ๐ฆ Rust library of natural language dictionaries using character-wise double-array tries.โ37Jan 13, 2025Updated last year
- Japanese synonym libraryโ55Feb 7, 2022Updated 4 years ago
- Proton VPN Special Offer - Get 70% off โข AdSpecial partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
- ๐ A fast implementation of the Aho-Corasick algorithm using the compact double-array data structure. (Python wrapper for daachorse)โ20Apr 1, 2026Updated last week
- Darts-clone python bindingโ20Apr 23, 2022Updated 3 years ago
- The robust text processing pipeline framework enabling customizable, efficient, and metric-logged text preprocessing.โ125Nov 13, 2025Updated 4 months ago
- Evidence-based Explanation Dataset (AACL-IJCNLP 2020)โ18Dec 17, 2020Updated 5 years ago
- ๐ฟ An easy-to-use Japanese Text Processing tool, which makes it possible to switch tokenizers with small changes of code.โ261Mar 30, 2026Updated last week
- โ52Sep 11, 2023Updated 2 years ago
- Pytorch implementation and pre-trained Japanese model for CANINE, the efficient character-level transformer.โ89Nov 3, 2023Updated 2 years ago
- Disambiguate japanese heteronymsโ32Oct 3, 2023Updated 2 years ago
- โ24Mar 18, 2026Updated 3 weeks ago
- Managed Database hosting by DigitalOcean โข AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- ๐ A list of pre-trained BERT models for Japanese with word/subword tokenization + vocabulary construction algorithm informationโ132Mar 15, 2023Updated 3 years ago
- A Japanese NLP Library using spaCy as framework based on Universal Dependenciesโ841Mar 30, 2024Updated 2 years ago
- A tool for visualizing the internal structures of morphological analyzer Sudachiโ18Jun 9, 2022Updated 3 years ago
- JGLUE: Japanese General Language Understanding Evaluationโ338Mar 31, 2025Updated last year
- Finding all pairs of similar documents time- and memory-efficientlyโ62Mar 13, 2025Updated last year
- Namelti : The automatic transcription generation library for person name in Katakanaโ21Jul 10, 2023Updated 2 years ago
- Japanese Realistic Textual Entailment Corpus (NLP 2020, LREC 2020)โ77Jun 23, 2023Updated 2 years ago
- Japanese data from the Google UDT 2.0.โ28Mar 24, 2023Updated 3 years ago
- ๆฌ่ชๅคๆใฟในใฏใซใใใ่ฉไพก็จใใผใฟใปใใโ21Nov 24, 2022Updated 3 years ago
- Managed hosting for WordPress and PHP on Cloudways โข AdManaged hosting with the flexibility to host WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Cloudways by DigitalOcean.
- Repository of ACL2023 paper: Unbalanced Optimal Transport for Unbalanced Word Alignmentโ38Sep 13, 2023Updated 2 years ago
- Japanese Morphological Analysis written in Rustโ83Dec 30, 2021Updated 4 years ago
- Are you SATySFi-ed with Nix?โ14Mar 6, 2023Updated 3 years ago
- A tool for comparing tokenizersโ121Nov 9, 2025Updated 5 months ago
- Awesome List of Sources of Japanese Censored Wordsโ19Sep 11, 2022Updated 3 years ago
- A Japanese Tokenizer for Businessโ952Jun 17, 2025Updated 9 months ago
- Rust implementation of SIF and uSIF: Simple and fast sentence embeddingโ19Jan 22, 2025Updated last year