A powerful text cleaner for Japanese web texts
☆12Jan 20, 2024Updated 2 years ago
Alternatives and similar repositories for text-cleaning
Users that are interested in text-cleaning are comparing it to the libraries listed below
Sorting:
- 基于中心度的中文关键短语抽取工具☆11Sep 2, 2022Updated 3 years ago
- NAACL'2021: Non-Parametric Few-Shot Learning for Word Sense Disambiguation☆10Jul 1, 2021Updated 4 years ago
- Yet another Python binding for Juman++/KNP/KWJA☆38Updated this week
- 🧨 Japanese Sentence Breaker 🧨☆14Jun 6, 2021Updated 4 years ago
- ☆33Jul 31, 2024Updated last year
- Use custom tokenizers in spacy-transformers☆16Aug 9, 2022Updated 3 years ago
- A Japanese dependency parser based on BERT☆23Oct 26, 2022Updated 3 years ago
- Annotated Fuman Kaitori Center Corpus☆18Dec 18, 2023Updated 2 years ago
- Kyoto University Web Document Leads Corpus☆83Dec 18, 2023Updated 2 years ago
- Rakuten MA (Python version)☆23May 22, 2017Updated 8 years ago
- Yet another sentence-level tokenizer for the Japanese text☆24Nov 27, 2025Updated 3 months ago
- ☆29Apr 10, 2025Updated 10 months ago
- COMET-ATOMIC ja☆31Mar 8, 2024Updated last year
- Bluetooth plugin for Flutter☆10Dec 19, 2022Updated 3 years ago
- OPI5 open micro desk design.☆13Mar 6, 2023Updated 3 years ago
- 日本語マルチタスク言語理解ベンチマーク Japanese Massive Multitask Language Understanding Benchmark☆38Oct 7, 2025Updated 5 months ago
- The framework for creating a new platform (like game engine).☆10Jan 11, 2026Updated last month
- Chat with your data while uploading a pdf file and using a local LLM.☆11Mar 19, 2024Updated last year
- PowerShell によって Windows10 のキッティングに必要な全工程を自動的に完了。☆12Jun 10, 2025Updated 8 months ago
- A collection of github workflow patterns☆10Feb 1, 2024Updated 2 years ago
- ISDB-S3 fork☆10Dec 13, 2024Updated last year
- Tokenizer POS-tagger Lemmatizer and Dependency-parser for modern and contemporary Japanese☆38Dec 29, 2025Updated 2 months ago
- Dockerで構築するMirakurun + EDCB + KonomiTVなTV視聴・録画環境☆15Jan 18, 2026Updated last month
- ☆11Oct 31, 2021Updated 4 years ago
- 「行動データの計算論モデリング」のサポートページです。☆11Mar 1, 2021Updated 5 years ago
- A library for evaluation of Grammatical Error Correction (GEC). Accepted to ACL'25 Demo: "gec-metrics: A Unified Library for Grammatical …☆14Jan 25, 2026Updated last month
- MPEG-2 TS packect check☆12Jun 3, 2024Updated last year
- ☆10Aug 27, 2025Updated 6 months ago
- 無料で使える中品質なテキスト読み上げソフトウェア、VOICEVOXの音声合成エンジン☆10Jan 30, 2023Updated 3 years ago
- ☆10Jun 24, 2022Updated 3 years ago
- ATSC 3.0 to MPEG-2 TS Converter☆21Sep 11, 2025Updated 5 months ago
- 自分用ビルドスクリプト集☆10Updated this week
- A tool for automatic English to Katakana conversion☆15Nov 26, 2025Updated 3 months ago
- alpacaデータセットを日本語化したものです☆86Jun 3, 2023Updated 2 years ago
- Code for evaluating Japanese pretrained models provided by NTT Ltd.☆245Jun 21, 2023Updated 2 years ago
- video content streaming server (support CMAF-ULL distribution)☆11Mar 31, 2022Updated 3 years ago
- EWoK dataset generation framework☆10May 14, 2024Updated last year
- TOTP (Time-based One-Time Password) authentication for Django REST Framework.☆14Feb 5, 2026Updated last month
- open-source Mandarian biased word dataset☆14Sep 21, 2023Updated 2 years ago