🛥 Vaporetto: Very accelerated pointwise prediction based tokenizer
☆270Feb 7, 2026Updated 2 months ago
Alternatives and similar repositories for vaporetto
Users that are interested in vaporetto are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- 🎤 vibrato: Viterbi-based accelerated tokenizer☆410Feb 7, 2026Updated 2 months ago
- Japanese tokenizer for Transformers☆79Dec 15, 2023Updated 2 years ago
- 🐎 A fast implementation of the Aho-Corasick algorithm using the compact double-array data structure in Rust.☆259Updated this week
- 🛥 Vaporetto is a fast and lightweight pointwise prediction based tokenizer. This is a Python wrapper for Vaporetto.☆21Jun 1, 2025Updated 10 months ago
- Sentence boundary disambiguation tool for Japanese texts (日本語文境界判定器)☆199Mar 26, 2024Updated 2 years ago
- Open source password manager - Proton Pass • AdSecurely store, share, and autofill your credentials with Proton Pass, the end-to-end encrypted password manager trusted by millions.
- Sudachi in Rust 🦀 and new generation of SudachiPy☆439Apr 20, 2026Updated last week
- A multilingual morphological analysis library.☆618Updated this week
- A library for semantic similarity search☆26Jan 31, 2025Updated last year
- 全国書誌データから作成した振り仮名のデータセット☆31Sep 21, 2021Updated 4 years ago
- This repository has implementations of data augmentation for NLP for Japanese.☆64Feb 16, 2023Updated 3 years ago
- An integrated Japanese analyzer based on foundation models☆142Apr 6, 2026Updated 3 weeks ago
- Wikipediaを用いた日本語の固有表現抽出データセット☆142Sep 2, 2023Updated 2 years ago
- 🦞 Rust library of natural language dictionaries using character-wise double-array tries.☆37Jan 13, 2025Updated last year
- Japanese synonym library☆55Feb 7, 2022Updated 4 years ago
- GPUs on demand by Runpod - Special Offer Available • AdRun AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
- 🐎 A fast implementation of the Aho-Corasick algorithm using the compact double-array data structure. (Python wrapper for daachorse)☆21Apr 18, 2026Updated last week
- Darts-clone python binding☆20Apr 23, 2022Updated 4 years ago
- The robust text processing pipeline framework enabling customizable, efficient, and metric-logged text preprocessing.☆126Apr 10, 2026Updated 2 weeks ago
- Evidence-based Explanation Dataset (AACL-IJCNLP 2020)☆18Dec 17, 2020Updated 5 years ago
- 🌿 An easy-to-use Japanese Text Processing tool, which makes it possible to switch tokenizers with small changes of code.☆261Apr 14, 2026Updated 2 weeks ago
- ☆52Sep 11, 2023Updated 2 years ago
- Pytorch implementation and pre-trained Japanese model for CANINE, the efficient character-level transformer.☆89Nov 3, 2023Updated 2 years ago
- Disambiguate japanese heteronyms☆32Oct 3, 2023Updated 2 years ago
- ☆24Mar 18, 2026Updated last month
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.