kajyuuen/daaja

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/kajyuuen/daaja)

kajyuuen / daaja

This repository has implementations of data augmentation for NLP for Japanese.

☆64

Alternatives and similar repositories for daaja

Users that are interested in daaja are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

kajyuuen / funer
View on GitHub
Funer is Rule based Named Entity Recognition tool.
☆22Apr 21, 2022Updated 4 years ago
cl-tohoku / AIO2_DPR_baseline
View on GitHub
https://www.nlp.ecei.tohoku.ac.jp/projects/aio/
☆16Aug 4, 2022Updated 3 years ago
osekilab / JCoLA
View on GitHub
☆19Apr 21, 2026Updated 3 months ago
daac-tools / find-simdoc
View on GitHub
Finding all pairs of similar documents time- and memory-efficiently
☆62Mar 13, 2025Updated last year
wwwcojp / ja_sentence_segmenter
View on GitHub
japanese sentence segmentation library for python
☆75Updated this week
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
aiishii / JEMHopQA
View on GitHub
☆30Apr 10, 2025Updated last year
SkelterLabsInc / JaQuAD
View on GitHub
JaQuAD: Japanese Question Answering Dataset for Machine Reading Comprehension (2022, Skelter Labs)
☆110Mar 2, 2022Updated 4 years ago
ku-nlp / kwja
View on GitHub
An integrated Japanese analyzer based on foundation models
☆145Jul 18, 2026Updated last week
himkt / awesome-bert-japanese
View on GitHub
📝 A list of pre-trained BERT models for Japanese with word/subword tokenization + vocabulary construction algorithm information
☆132Mar 15, 2023Updated 3 years ago
megagonlabs / bunkai
View on GitHub
Sentence boundary disambiguation tool for Japanese texts (日本語文境界判定器)
☆200Mar 26, 2024Updated 2 years ago
upura / weekly-kaggle-news-archive
View on GitHub
https://weeklykagglenews.substack.com
☆24Dec 31, 2022Updated 3 years ago
yagays / nayose-wikipedia-ja
View on GitHub
Wikipediaから作成した日本語名寄せデータセット
☆35Mar 10, 2020Updated 6 years ago
stockmarkteam / ner-wikipedia-dataset
View on GitHub
Wikipediaを用いた日本語の固有表現抽出データセット
☆143Sep 2, 2023Updated 2 years ago
yahoojapan / JGLUE
View on GitHub
JGLUE: Japanese General Language Understanding Evaluation
☆346Mar 31, 2025Updated last year
Open source password manager - Proton Pass • Ad
Securely store, share, and autofill your credentials with Proton Pass, the end-to-end encrypted password manager trusted by millions.
sonoisa / clip-japanese
View on GitHub
日本語CLIPモデル
☆13Sep 15, 2025Updated 10 months ago
daac-tools / vaporetto
View on GitHub
🛥 Vaporetto: Very accelerated pointwise prediction based tokenizer
☆297Updated this week
WorksApplications / SudachiTra
View on GitHub
Japanese tokenizer for Transformers
☆81Dec 15, 2023Updated 2 years ago
chakki-works / Japanese-Company-Lexicon
View on GitHub
☆99Jul 23, 2023Updated 3 years ago
megagonlabs / ginza-transformers
View on GitHub
Use custom tokenizers in spacy-transformers
☆16Aug 9, 2022Updated 3 years ago
hottolink / pydams
View on GitHub
Python bindings around DAMS: Geocoding Tool for Japanese Address
☆20May 19, 2020Updated 6 years ago
retarfi / language-pretraining
View on GitHub
Pre-training Language Models for Japanese
☆50Jul 2, 2023Updated 3 years ago
megagonlabs / ebe-dataset
View on GitHub
Evidence-based Explanation Dataset (AACL-IJCNLP 2020)
☆18Dec 17, 2020Updated 5 years ago
akirakubo / bert-japanese-aozora
View on GitHub
Japanese BERT trained on Aozora Bunko and Wikipedia, pre-tokenized by MeCab with UniDic & SudachiPy
☆40Aug 8, 2020Updated 5 years ago
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
nobu-g / cohesion-analysis
View on GitHub
Code for COLING 2020 Paper
☆13Feb 3, 2026Updated 5 months ago
singletongue / wikipedia-utils
View on GitHub
Utility scripts for preprocessing Wikipedia texts for NLP
☆78Apr 9, 2024Updated 2 years ago
ids-cv / wrime
View on GitHub
☆177Sep 11, 2025Updated 10 months ago
HojiChar / HojiChar
View on GitHub
The robust text processing pipeline framework enabling customizable, efficient, and metric-logged text preprocessing.
☆128Jul 17, 2026Updated last week
takapy0210 / nlplot
View on GitHub
Visualization Module for Natural Language Processing
☆238Sep 21, 2022Updated 3 years ago
eggplants / awesome-japanese-censored-words
View on GitHub
Awesome List of Sources of Japanese Censored Words
☆19Sep 11, 2022Updated 3 years ago
cl-tohoku / keigo_transfer_task
View on GitHub
敬語変換タスクにおける評価用データセット
☆21Nov 24, 2022Updated 3 years ago
anyai-28 / nishika_jpo_2nd_solution
View on GitHub
☆11Feb 7, 2024Updated 2 years ago
uehara1414 / japanize-matplotlib
View on GitHub
install & import するだけで matplotlib を日本語表示対応させる
☆203Apr 30, 2024Updated 2 years ago
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
Katsumata420 / wikihow_japanese
View on GitHub
☆35Dec 17, 2020Updated 5 years ago
tkuri / albumentations_test
View on GitHub
albumentations test
☆11Jun 23, 2020Updated 6 years ago
BandaiNamcoResearchInc / DistilBERT-base-jp
View on GitHub
☆161Oct 19, 2020Updated 5 years ago
daac-tools / python-vaporetto
View on GitHub
🛥 Vaporetto is a fast and lightweight pointwise prediction based tokenizer. (Python wrapper)
☆21May 30, 2026Updated last month
NibuTake / LiNGAM-fast
View on GitHub
☆12May 17, 2018Updated 8 years ago
line / LINE-DistilBERT-Japanese
View on GitHub
DistilBERT model pre-trained on 131 GB of Japanese web text. The teacher model is BERT-base that built in-house at LINE.
☆47Mar 22, 2023Updated 3 years ago
cl-tohoku / JAQKET_baseline
View on GitHub
☆16Oct 11, 2021Updated 4 years ago