masanorihirano/llm-japanese-dataset

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/masanorihirano/llm-japanese-dataset)

masanorihirano / llm-japanese-dataset

LLM構築用の日本語チャットデータセット

☆88

Alternatives and similar repositories for llm-japanese-dataset

Users that are interested in llm-japanese-dataset are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

kunishou / do-not-answer-ja
View on GitHub
☆24Dec 15, 2023Updated 2 years ago
yuzu-ai / japanese-llm-ranking
View on GitHub
☆50Apr 10, 2024Updated 2 years ago
nobu-g / cohesion-analysis
View on GitHub
Code for COLING 2020 Paper
☆13Feb 3, 2026Updated 5 months ago
colorfulscoop / sbert-ja
View on GitHub
Code to train Sentence BERT Japanese model for Hugging Face Model Hub
☆11Aug 8, 2021Updated 4 years ago
KanHatakeyama / JapaneseWarcParser
View on GitHub
☆16Mar 4, 2024Updated 2 years ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
megagonlabs / instruction_ja
View on GitHub
Japanese instruction data (日本語指示データ)
☆24Jul 13, 2023Updated 3 years ago
jungokasai / IgakuQA
View on GitHub
☆52Mar 31, 2023Updated 3 years ago
yahoojapan / JGLUE
View on GitHub
JGLUE: Japanese General Language Understanding Evaluation
☆346Mar 31, 2025Updated last year
llm-jp / llm-jp-eval
View on GitHub
☆165Jul 19, 2026Updated last week
HojiChar / HojiChar
View on GitHub
The robust text processing pipeline framework enabling customizable, efficient, and metric-logged text preprocessing.
☆128Jul 17, 2026Updated last week
ku-nlp / AnnotatedFKCCorpus
View on GitHub
Annotated Fuman Kaitori Center Corpus
☆18Dec 18, 2023Updated 2 years ago
llm-jp / llm-jp-sft
View on GitHub
☆62Jun 13, 2024Updated 2 years ago
llm-jp / awesome-japanese-llm
View on GitHub
日本語LLMまとめ - Overview of Japanese LLMs
☆1,422Updated this week
yagays / nayose-wikipedia-ja
View on GitHub
Wikipediaから作成した日本語名寄せデータセット
☆35Mar 10, 2020Updated 6 years ago
Simple, predictable pricing with DigitalOcean hosting • Ad
Always know what you'll pay with monthly caps and flat pricing. Enterprise-grade infrastructure trusted by 600k+ customers.
lighttransport / japanese-llama-experiment
View on GitHub
Japanese LLaMa experiment
☆54Dec 27, 2025Updated 7 months ago
llm-jp / llm-jp-corpus
View on GitHub
☆47Feb 2, 2024Updated 2 years ago
cl-tohoku / keigo_transfer_task
View on GitHub
敬語変換タスクにおける評価用データセット
☆21Nov 24, 2022Updated 3 years ago
verypluming / JaNLI
View on GitHub
☆17May 31, 2023Updated 3 years ago
NovelAI / novelai-tokenizer
View on GitHub
Sentencepiece based BPE tokenizer for English and Japanese language text.
☆29Apr 4, 2024Updated 2 years ago
megagonlabs / ebe-dataset
View on GitHub
Evidence-based Explanation Dataset (AACL-IJCNLP 2020)
☆18Dec 17, 2020Updated 5 years ago
WorksApplications / uzushio
View on GitHub
☆24Mar 18, 2026Updated 4 months ago
laksjdjf / pfg
View on GitHub
☆20Mar 28, 2023Updated 3 years ago
SkelterLabsInc / JaQuAD
View on GitHub
JaQuAD: Japanese Question Answering Dataset for Machine Reading Comprehension (2022, Skelter Labs)
☆111Mar 2, 2022Updated 4 years ago
AI Agents on DigitalOcean Gradient AI Platform • Ad
Build production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
shisa-ai / shaberi
View on GitHub
Lightblue LLM Eval Framework: tengu, elyza100, ja-mtbench, rakuda
☆19Apr 29, 2026Updated 3 months ago
nlp-waseda / JMMLU
View on GitHub
日本語マルチタスク言語理解ベンチマーク Japanese Massive Multitask Language Understanding Benchmark
☆40Oct 7, 2025Updated 9 months ago
shihono / evaluate_japanese_w2v
View on GitHub
script to evaluate pre-trained Japanese word2vec model on Japanese similarity dataset
☆12Nov 4, 2024Updated last year
AUGMXNT / shisa
View on GitHub
☆43Mar 30, 2024Updated 2 years ago
taishi-i / awesome-japanese-nlp-resources
View on GitHub
A curated list of resources for Japanese natural language processing (NLP): Python libraries, LLMs, dictionaries, corpora, and datasets. …
☆1,000Updated this week
jqk09a / japanese-daily-dialogue
View on GitHub
☆60Mar 17, 2023Updated 3 years ago
yuki-yano / slack-hot-channel-deno
View on GitHub
☆16Oct 21, 2024Updated last year
ce-lery / japanese-mistral-300m-recipe
View on GitHub
☆19Mar 12, 2026Updated 4 months ago
ujiuji1259 / shinra-attribute-extraction
View on GitHub
☆11Sep 7, 2021Updated 4 years ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
tayayan / HiraganaSuisho
View on GitHub
☆17Jul 21, 2025Updated last year
Hajime-Y / BitNet-b158
View on GitHub
☆20Apr 29, 2024Updated 2 years ago
cl-tohoku / JAQKET_baseline
View on GitHub
☆16Oct 11, 2021Updated 4 years ago
wandb / llm-leaderboard
View on GitHub
Project of llm evaluation to Japanese tasks
☆94Updated this week
cl-tohoku / ILYS-aoba-chatbot
View on GitHub
☆23Oct 1, 2021Updated 4 years ago
kunishou / Japanese-Alpaca-LoRA
View on GitHub
☆141Apr 2, 2023Updated 3 years ago
turingmotors / heron
View on GitHub
Heron is a library that seamlessly integrates multiple Vision and Language models, as well as Video and Language models.
☆177Jun 13, 2024Updated 2 years ago