megagonlabs/UD_Japanese-GSD

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/megagonlabs/UD_Japanese-GSD)

megagonlabs / UD_Japanese-GSD

Japanese data from the Google UDT 2.0.

☆28

Alternatives and similar repositories for UD_Japanese-GSD

Users that are interested in UD_Japanese-GSD are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

singletongue / wikipedia-utils
View on GitHub
Utility scripts for preprocessing Wikipedia texts for NLP
☆78Apr 9, 2024Updated 2 years ago
megagonlabs / jrte-corpus
View on GitHub
Japanese Realistic Textual Entailment Corpus (NLP 2020, LREC 2020)
☆77Jun 23, 2023Updated 3 years ago
WorksApplications / SudachiTra
View on GitHub
Japanese tokenizer for Transformers
☆81Dec 15, 2023Updated 2 years ago
stockmarkteam / ner-wikipedia-dataset
View on GitHub
Wikipediaを用いた日本語の固有表現抽出データセット
☆143Sep 2, 2023Updated 2 years ago
taishi-i / toiro
View on GitHub
A tool for comparing tokenizers
☆122Nov 9, 2025Updated 8 months ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
ujiuji1259 / shinra-attribute-extraction
View on GitHub
☆11Sep 7, 2021Updated 4 years ago
himkt / awesome-bert-japanese
View on GitHub
📝 A list of pre-trained BERT models for Japanese with word/subword tokenization + vocabulary construction algorithm information
☆132Mar 15, 2023Updated 3 years ago
ou-medinfo / medbertjp
View on GitHub
Trials of pre-trained BERT models for the medical domain in Japanese.
☆13Nov 21, 2020Updated 5 years ago
shunk031 / huggingface-datasets_JGLUE
View on GitHub
JGLUE: Japanese General Language Understanding Evaluation for huggingface datasets
☆14Mar 31, 2025Updated last year
megagonlabs / ebe-dataset
View on GitHub
Evidence-based Explanation Dataset (AACL-IJCNLP 2020)
☆18Dec 17, 2020Updated 5 years ago
informatix-inc / bert
View on GitHub
☆28Apr 5, 2022Updated 4 years ago
megagonlabs / ginza-transformers
View on GitHub
Use custom tokenizers in spacy-transformers
☆16Aug 9, 2022Updated 3 years ago
yagays / nayose-wikipedia-ja
View on GitHub
Wikipediaから作成した日本語名寄せデータセット
☆35Mar 10, 2020Updated 6 years ago
ku-nlp / WikipediaAnnotatedCorpus
View on GitHub
☆30Jul 1, 2026Updated 3 weeks ago
Open source password manager - Proton Pass • Ad
Securely store, share, and autofill your credentials with Proton Pass, the end-to-end encrypted password manager trusted by millions.
hiroki13 / instance-based-ner
View on GitHub
☆18Feb 15, 2023Updated 3 years ago
cl-tohoku / keigo_transfer_task
View on GitHub
敬語変換タスクにおける評価用データセット
☆21Nov 24, 2022Updated 3 years ago
junya-takayama / DIRECT
View on GitHub
DIRECT: Direct and Indirect REsponses in Conversational Text Corpus
☆17Jul 1, 2021Updated 5 years ago
akirakubo / bert-japanese-aozora
View on GitHub
Japanese BERT trained on Aozora Bunko and Wikipedia, pre-tokenized by MeCab with UniDic & SudachiPy
☆40Aug 8, 2020Updated 5 years ago
chakki-works / Japanese-Company-Lexicon
View on GitHub
☆99Jul 23, 2023Updated 3 years ago
masayu-a / NAIST-JENE
View on GitHub
☆10Aug 13, 2012Updated 13 years ago
shimo-lab / sembei
View on GitHub
単語分割を経由しない単語埋め込み
☆14Mar 19, 2017Updated 9 years ago
UniversalDependencies / UD_Japanese-BCCWJ
View on GitHub
☆27May 6, 2026Updated 2 months ago
aiishii / JEMHopQA
View on GitHub
☆30Apr 10, 2025Updated last year
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
nobu-g / cohesion-analysis
View on GitHub
Code for COLING 2020 Paper
☆13Feb 3, 2026Updated 5 months ago
de9uch1 / fairseq-tutorial
View on GitHub
Fairseq tutorial
☆18May 18, 2022Updated 4 years ago
UniversalDependencies / UD_Japanese-GSD
View on GitHub
Japanese data from the Google UDT 2.0.
☆40May 6, 2026Updated 2 months ago
nlp-waseda / Kanbun-LM
View on GitHub
Code for paper "Kanbun-LM: Reading and Translating Classical Chinese in Japanese Method by Language Models"
☆21Jul 10, 2023Updated 3 years ago
ikegami-yukino / sengiri
View on GitHub
Yet another sentence-level tokenizer for the Japanese text
☆24Nov 27, 2025Updated 8 months ago
azu / sudachi-synonyms-dictionary
View on GitHub
Sudachi's synonyms dictionary
☆15Updated this week
wwwcojp / ja_sentence_segmenter
View on GitHub
japanese sentence segmentation library for python
☆76Updated this week
daac-tools / python-vaporetto
View on GitHub
🛥 Vaporetto is a fast and lightweight pointwise prediction based tokenizer. (Python wrapper)
☆21May 30, 2026Updated last month
yahoojapan / VFD-Dataset
View on GitHub
☆11Nov 10, 2020Updated 5 years ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
WorksApplications / chiVe
View on GitHub
Japanese word embedding with Sudachi and NWJC 🌿
☆177Mar 1, 2024Updated 2 years ago
KodairaTomonori / ThreeLineSummaryDataset
View on GitHub
☆31Apr 4, 2018Updated 8 years ago
BandaiNamcoResearchInc / DistilBERT-base-jp
View on GitHub
☆161Oct 19, 2020Updated 5 years ago
shiroyagicorp / sitq
View on GitHub
Learning to Hash for Maximum Inner Product Search
☆12Jan 21, 2022Updated 4 years ago
skozawa / Comainu
View on GitHub
COrpus based Morphological Analyzer with INtegrated User dictionary
☆21Mar 30, 2025Updated last year
kzinmr / transformers_ner_ja
View on GitHub
Japanese NER with Transformers + PyTorch-Lightning + MLflow Tracking
☆15Nov 20, 2022Updated 3 years ago
octanove / shiba
View on GitHub
Pytorch implementation and pre-trained Japanese model for CANINE, the efficient character-level transformer.
☆89Nov 3, 2023Updated 2 years ago