octanove/shiba

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/octanove/shiba)

octanove / shiba

Pytorch implementation and pre-trained Japanese model for CANINE, the efficient character-level transformer.

☆89

Alternatives and similar repositories for shiba

Users that are interested in shiba are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

shunk031 / allennlp-shiba-model
View on GitHub
AllenNLP integration for Shiba: Japanese CANINE model
☆12Jun 26, 2021Updated 5 years ago
WorksApplications / SudachiTra
View on GitHub
Japanese tokenizer for Transformers
☆81Dec 15, 2023Updated 2 years ago
megagonlabs / ginza-transformers
View on GitHub
Use custom tokenizers in spacy-transformers
☆16Aug 9, 2022Updated 3 years ago
chemicaltree / tetra
View on GitHub
☆10Sep 14, 2022Updated 3 years ago
ku-nlp / AnnotatedFKCCorpus
View on GitHub
Annotated Fuman Kaitori Center Corpus
☆18Dec 18, 2023Updated 2 years ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
yagays / ja-timex
View on GitHub
自然言語で書かれた時間情報表現を抽出/規格化するルールベースの解析器
☆141Feb 27, 2025Updated last year
junya-takayama / DIRECT
View on GitHub
DIRECT: Direct and Indirect REsponses in Conversational Text Corpus
☆17Jul 1, 2021Updated 5 years ago
himkt / awesome-bert-japanese
View on GitHub
📝 A list of pre-trained BERT models for Japanese with word/subword tokenization + vocabulary construction algorithm information
☆132Mar 15, 2023Updated 3 years ago
shigashiyama / nlp_survey
View on GitHub
☆15Mar 31, 2020Updated 6 years ago
verypluming / JaNLI
View on GitHub
☆17May 31, 2023Updated 3 years ago
megagonlabs / jrte-corpus
View on GitHub
Japanese Realistic Textual Entailment Corpus (NLP 2020, LREC 2020)
☆77Jun 23, 2023Updated 3 years ago
megagonlabs / bunkai
View on GitHub
Sentence boundary disambiguation tool for Japanese texts (日本語文境界判定器)
☆200Mar 26, 2024Updated 2 years ago
nttcslab / japanese-dialog-transformers
View on GitHub
Code for evaluating Japanese pretrained models provided by NTT Ltd.
☆246Jun 21, 2023Updated 3 years ago
cl-tohoku / keigo_transfer_task
View on GitHub
敬語変換タスクにおける評価用データセット
☆21Nov 24, 2022Updated 3 years ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
megagonlabs / ebe-dataset
View on GitHub
Evidence-based Explanation Dataset (AACL-IJCNLP 2020)
☆18Dec 17, 2020Updated 5 years ago
retarfi / language-pretraining
View on GitHub
Pre-training Language Models for Japanese
☆50Jul 2, 2023Updated 3 years ago
daac-tools / vaporetto
View on GitHub
🛥 Vaporetto: Very accelerated pointwise prediction based tokenizer
☆297Jul 20, 2026Updated last week
WorksApplications / chikkarpy
View on GitHub
Japanese synonym library
☆55Feb 7, 2022Updated 4 years ago
ou-medinfo / medbertjp
View on GitHub
Trials of pre-trained BERT models for the medical domain in Japanese.
☆13Nov 21, 2020Updated 5 years ago
HojiChar / HojiChar
View on GitHub
The robust text processing pipeline framework enabling customizable, efficient, and metric-logged text preprocessing.
☆128Jul 17, 2026Updated last week
izuna385 / Wikia-and-Wikipedia-EL-Dataset-Creator
View on GitHub
You can create datasets from Wikia/Wikipedia that can be used for entity recognition and Entity Linking. Dumps for ja-wiki and VTuber-wik…
☆18May 2, 2021Updated 5 years ago
wtsnjp / MioGatto
View on GitHub
An annotation tool for grounding of formulae
☆24May 28, 2024Updated 2 years ago
himkt / konoha
View on GitHub
🌿 An easy-to-use Japanese Text Processing tool, which makes it possible to switch tokenizers with small changes of code.
☆263Jul 19, 2026Updated last week
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
yagays / nayose-wikipedia-ja
View on GitHub
Wikipediaから作成した日本語名寄せデータセット
☆35Mar 10, 2020Updated 6 years ago
akirakubo / bert-japanese-aozora
View on GitHub
Japanese BERT trained on Aozora Bunko and Wikipedia, pre-tokenized by MeCab with UniDic & SudachiPy
☆40Aug 8, 2020Updated 5 years ago
SkelterLabsInc / JaQuAD
View on GitHub
JaQuAD: Japanese Question Answering Dataset for Machine Reading Comprehension (2022, Skelter Labs)
☆111Mar 2, 2022Updated 4 years ago
megagonlabs / t5-japanese
View on GitHub
Codes to pre-train Japanese T5 models
☆40Sep 7, 2021Updated 4 years ago
megagonlabs / UD_Japanese-GSD
View on GitHub
Japanese data from the Google UDT 2.0.
☆28Mar 24, 2023Updated 3 years ago
tsuruoka-lab / AMI-Meeting-Parallel-Corpus
View on GitHub
AMI Meeting Parallel Corpus
☆13Dec 11, 2020Updated 5 years ago
ku-nlp / kwja
View on GitHub
An integrated Japanese analyzer based on foundation models
☆145Jul 18, 2026Updated last week
laboroai / Laboro-ParaCorpus
View on GitHub
Scripts for creating a Japanese-English parallel corpus and training NMT models
☆19Nov 9, 2021Updated 4 years ago
daac-tools / python-vaporetto
View on GitHub
🛥 Vaporetto is a fast and lightweight pointwise prediction based tokenizer. (Python wrapper)
☆21May 30, 2026Updated last month
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
cl-tohoku / bert-japanese
View on GitHub
BERT models for Japanese text.
☆551Mar 23, 2024Updated 2 years ago
moisutsu / classopt
View on GitHub
Arguments parser with class for Python, inspired by StructOpt
☆62Sep 17, 2023Updated 2 years ago
daac-tools / find-simdoc
View on GitHub
Finding all pairs of similar documents time- and memory-efficiently
☆62Mar 13, 2025Updated last year
ndl-lab / ndlngramdata
View on GitHub
デジタル化資料から作成したOCRテキストデータのngram頻度統計情報のデータセット
☆17Jan 10, 2023Updated 3 years ago
verypluming / JSICK
View on GitHub
Repository for JSICK
☆46May 31, 2023Updated 3 years ago
wwwcojp / ja_sentence_segmenter
View on GitHub
japanese sentence segmentation library for python
☆76Updated this week
ku-nlp / KWDLC
View on GitHub
Kyoto University Web Document Leads Corpus
☆84Dec 18, 2023Updated 2 years ago