megagonlabs / UD_Japanese-GSDView external linksLinks
Japanese data from the Google UDT 2.0.
☆28Mar 24, 2023Updated 2 years ago
Alternatives and similar repositories for UD_Japanese-GSD
Users that are interested in UD_Japanese-GSD are comparing it to the libraries listed below
Sorting:
- Japanese Realistic Textual Entailment Corpus (NLP 2020, LREC 2020)☆77Jun 23, 2023Updated 2 years ago
- Utility scripts for preprocessing Wikipedia texts for NLP☆78Apr 9, 2024Updated last year
- Japanese tokenizer for Transformers☆79Dec 15, 2023Updated 2 years ago
- 📝 A list of pre-trained BERT models for Japanese with word/subword tokenization + vocabulary construction algorithm information☆131Mar 15, 2023Updated 2 years ago
- Trials of pre-trained BERT models for the medical domain in Japanese.☆12Nov 21, 2020Updated 5 years ago
- Wikipediaを用いた日本語の固有表現抽出データセット☆142Sep 2, 2023Updated 2 years ago
- DIRECT: Direct and Indirect REsponses in Conversational Text Corpus☆17Jul 1, 2021Updated 4 years ago
- 単語分割を経由しない単語埋め込み☆14Mar 19, 2017Updated 8 years ago
- ☆29Apr 10, 2025Updated 10 months ago
- A tool for comparing tokenizers☆120Nov 9, 2025Updated 3 months ago
- ☆31Apr 4, 2018Updated 7 years ago
- ☆100Jul 23, 2023Updated 2 years ago
- Evidence-based Explanation Dataset (AACL-IJCNLP 2020)☆18Dec 17, 2020Updated 5 years ago
- Wikipediaから作成した日本語名寄せデータセット☆35Mar 10, 2020Updated 5 years ago
- Japanese data from the Google UDT 2.0.☆38Nov 12, 2025Updated 3 months ago
- 敬語変換タスクにおける評価用データセット☆21Nov 24, 2022Updated 3 years ago
- ☆10Sep 14, 2022Updated 3 years ago
- ☆10Aug 13, 2012Updated 13 years ago
- ☆28Apr 5, 2022Updated 3 years ago
- ☆27Nov 12, 2025Updated 3 months ago
- 🛥 Vaporetto is a fast and lightweight pointwise prediction based tokenizer. This is a Python wrapper for Vaporetto.☆21Jun 1, 2025Updated 8 months ago
- japanese sentence segmentation library for python☆73Apr 3, 2023Updated 2 years ago
- ☆11Sep 7, 2021Updated 4 years ago
- Yet another sentence-level tokenizer for the Japanese text☆24Nov 27, 2025Updated 2 months ago
- ☆11Nov 10, 2020Updated 5 years ago
- Learning to Hash for Maximum Inner Product Search☆12Jan 21, 2022Updated 4 years ago
- JGLUE: Japanese General Language Understanding Evaluation for huggingface datasets☆12Mar 31, 2025Updated 10 months ago
- ☆11Feb 1, 2026Updated 2 weeks ago
- CaboCha wrapper for Python3☆46Jul 5, 2018Updated 7 years ago
- ☆161Oct 19, 2020Updated 5 years ago
- Sudachi's synonyms dictionary☆13Jan 25, 2026Updated 3 weeks ago
- 日本の祝日 祝祭日 を計算するPythonライブラリ☆15Jul 25, 2022Updated 3 years ago
- Code for COLING 2020 Paper☆13Feb 3, 2026Updated last week
- Japanese word embedding with Sudachi and NWJC 🌿☆169Mar 1, 2024Updated last year
- Kyoto University Web Document Leads Corpus☆83Dec 18, 2023Updated 2 years ago
- ☆33Apr 27, 2020Updated 5 years ago
- ☆35Dec 17, 2020Updated 5 years ago
- BERT with SentencePiece for Japanese text.☆33Oct 28, 2021Updated 4 years ago
- An integrated Japanese analyzer based on foundation models☆138Feb 2, 2026Updated 2 weeks ago