megagonlabs/t5-japanese

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/megagonlabs/t5-japanese)

megagonlabs / t5-japanese

Codes to pre-train Japanese T5 models

☆40

Alternatives and similar repositories for t5-japanese

Users that are interested in t5-japanese are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

sonoisa / t5-japanese
View on GitHub
日本語T5モデル
☆118Sep 15, 2025Updated 10 months ago
yahoojapan / JGLUE
View on GitHub
JGLUE: Japanese General Language Understanding Evaluation
☆346Mar 31, 2025Updated last year
cl-tohoku / keigo_transfer_task
View on GitHub
敬語変換タスクにおける評価用データセット
☆21Nov 24, 2022Updated 3 years ago
aiishii / JEMHopQA
View on GitHub
☆30Apr 10, 2025Updated last year
octanove / shiba
View on GitHub
Pytorch implementation and pre-trained Japanese model for CANINE, the efficient character-level transformer.
☆89Nov 3, 2023Updated 2 years ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
daac-tools / python-vaporetto
View on GitHub
🛥 Vaporetto is a fast and lightweight pointwise prediction based tokenizer. (Python wrapper)
☆21May 30, 2026Updated last month
colorfulscoop / sbert-ja
View on GitHub
Code to train Sentence BERT Japanese model for Hugging Face Model Hub
☆11Aug 8, 2021Updated 4 years ago
yoichi1484 / subspace
View on GitHub
An implementation of "Subspace Representations for Soft Set Operations and Sentence Similarities" (NAACL 2024)
☆10May 31, 2024Updated 2 years ago
osekilab / JCoLA
View on GitHub
☆19Apr 21, 2026Updated 3 months ago
nobu-g / cohesion-analysis
View on GitHub
Code for COLING 2020 Paper
☆13Feb 3, 2026Updated 5 months ago
singletongue / wikipedia-utils
View on GitHub
Utility scripts for preprocessing Wikipedia texts for NLP
☆78Apr 9, 2024Updated 2 years ago
cl-tohoku / AIO2_DPR_baseline
View on GitHub
https://www.nlp.ecei.tohoku.ac.jp/projects/aio/
☆16Aug 4, 2022Updated 3 years ago
1never / open2ch-dialogue-corpus
View on GitHub
おーぷん2ちゃんねるをクロールして作成した対話コーパス
☆101Jun 6, 2021Updated 5 years ago
megagonlabs / coop
View on GitHub
☘️ Code for Convex Aggregation for Opinion Summarization (Iso et al; Findings of EMNLP 2021)
☆35Dec 22, 2022Updated 3 years ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
nttcslab / japanese-dialog-transformers
View on GitHub
Code for evaluating Japanese pretrained models provided by NTT Ltd.
☆246Jun 21, 2023Updated 3 years ago
odashi / small_parallel_enja
View on GitHub
50k English-Japanese Parallel Corpus for Machine Translation Benchmark.
☆97Sep 11, 2019Updated 6 years ago
hitachi-nlp / larch
View on GitHub
LARCH: Large Language Model-based Automatic Readme Creation with Heuristics
☆17Jul 1, 2023Updated 3 years ago
hkjeon13 / noising-korean
View on GitHub
한국어 문서에 노이즈를 추가합니다.
☆27Nov 9, 2022Updated 3 years ago
ohtaman / abci-examples
View on GitHub
Simple examples of DeepLearning on ABCI.
☆24Oct 23, 2023Updated 2 years ago
WorksApplications / SudachiTra
View on GitHub
Japanese tokenizer for Transformers
☆81Dec 15, 2023Updated 2 years ago
oshizo / JapaneseEmbeddingEval
View on GitHub
☆183Oct 9, 2024Updated last year
sbintuitions / JMTEB
View on GitHub
The evaluation scripts of JMTEB (Japanese Massive Text Embedding Benchmark)
☆93Updated this week
megagonlabs / UD_Japanese-GSD
View on GitHub
Japanese data from the Google UDT 2.0.
☆28Mar 24, 2023Updated 3 years ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
himkt / awesome-bert-japanese
View on GitHub
📝 A list of pre-trained BERT models for Japanese with word/subword tokenization + vocabulary construction algorithm information
☆132Mar 15, 2023Updated 3 years ago
SkelterLabsInc / JaQuAD
View on GitHub
JaQuAD: Japanese Question Answering Dataset for Machine Reading Comprehension (2022, Skelter Labs)
☆111Mar 2, 2022Updated 4 years ago
verypluming / JSICK
View on GitHub
Repository for JSICK
☆46May 31, 2023Updated 3 years ago
WorksApplications / chikkarpy
View on GitHub
Japanese synonym library
☆55Feb 7, 2022Updated 4 years ago
stockmarkteam / ner-wikipedia-dataset
View on GitHub
Wikipediaを用いた日本語の固有表現抽出データセット
☆143Sep 2, 2023Updated 2 years ago
cl-tohoku / bert-japanese
View on GitHub
BERT models for Japanese text.
☆551Mar 23, 2024Updated 2 years ago
japanese-law-analysis / data_set
View on GitHub
法律・判例関係のデータセット
☆53Jan 8, 2025Updated last year
retarfi / language-pretraining
View on GitHub
Pre-training Language Models for Japanese
☆50Jul 2, 2023Updated 3 years ago
ajb129 / KeyakiTreebank
View on GitHub
Keyaki Treebank Parsed Corpus
☆10May 15, 2019Updated 7 years ago
GPUs on demand by Runpod - Special Offer Available • Ad
Run AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
BandaiNamcoResearchInc / DistilBERT-base-jp
View on GitHub
☆161Oct 19, 2020Updated 5 years ago
snoop2head / KLUE-RBERT
View on GitHub
↔️ Utilizing RBERT model structure for KLUE Relation Extraction task
☆15Nov 15, 2022Updated 3 years ago
kajyuuen / daaja
View on GitHub
This repository has implementations of data augmentation for NLP for Japanese.
☆64Feb 16, 2023Updated 3 years ago
yagays / nayose-wikipedia-ja
View on GitHub
Wikipediaから作成した日本語名寄せデータセット
☆35Mar 10, 2020Updated 6 years ago
megagonlabs / bunkai
View on GitHub
Sentence boundary disambiguation tool for Japanese texts (日本語文境界判定器)
☆200Mar 26, 2024Updated 2 years ago
ndl-lab / huriganacorpus-aozora
View on GitHub
青空文庫及びサピエの点字データから作成した振り仮名コーパスのデータセット
☆22Jan 17, 2024Updated 2 years ago
inspection-ai / japanese-toxic-dataset
View on GitHub
☆22Jan 11, 2023Updated 3 years ago