llm-jp/llm-jp-tokenizer

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/llm-jp/llm-jp-tokenizer)

llm-jp / llm-jp-tokenizer

☆48

Alternatives and similar repositories for llm-jp-tokenizer

Users that are interested in llm-jp-tokenizer are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

swallow-llm / swallow-evaluation
View on GitHub
Swallowプロジェクト大規模言語モデル評価スクリプト
☆25Sep 17, 2025Updated 10 months ago
matsuolab / ucllm_nedo_prod
View on GitHub
☆56Jun 17, 2024Updated 2 years ago
osekilab / JCoLA
View on GitHub
☆19Apr 21, 2026Updated 3 months ago
pfnet-research / pfgen-bench
View on GitHub
Preferred Generation Benchmark
☆102Mar 6, 2026Updated 4 months ago
stardust-coder / japanese-lm-med-harness
View on GitHub
☆11Oct 2, 2024Updated last year
Bare Metal GPUs on DigitalOcean Gradient AI • Ad
Purpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
llm-jp / llm-jp-4-cookbook
View on GitHub
Example scripts for LLM-jp-4 models
☆31Jun 23, 2026Updated last month
KanHatakeyama / synthetic-texts-by-llm
View on GitHub
☆27Nov 4, 2024Updated last year
llm-jp / llm-jp-eval
View on GitHub
☆165Jul 19, 2026Updated last week
OpenMOSE / RWKV5-LM-LoRA
View on GitHub
RWKV v5,v6 LoRA Trainer on Cuda and Rocm Platform. RWKV is a RNN with transformer-level LLM performance. It can be directly trained like …
☆13Mar 24, 2024Updated 2 years ago
llm-jp / llm-jp-sft
View on GitHub
☆62Jun 13, 2024Updated 2 years ago
opensource-jp / Open-Source-AI
View on GitHub
Japanese translation of Open Source AI Definition
☆27Nov 15, 2024Updated last year
nlp-waseda / JMMLU
View on GitHub
日本語マルチタスク言語理解ベンチマーク Japanese Massive Multitask Language Understanding Benchmark
☆40Oct 7, 2025Updated 9 months ago
llm-jp / llm-jp-corpus
View on GitHub
☆47Feb 2, 2024Updated 2 years ago
matsuolab / llm_bridge_prod
View on GitHub
☆34Aug 21, 2025Updated 11 months ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
rioyokotalab / Megatron-Llama2
View on GitHub
2023 ABCI Llama-2 継続学習プロジェクト
☆14Jan 22, 2024Updated 2 years ago
ce-lery / japanese-mistral-300m-recipe
View on GitHub
☆19Mar 12, 2026Updated 4 months ago
okoge-kaz / llm-recipes
View on GitHub
Ongoing Research Project for continaual pre-training LLM(dense mode)
☆45Mar 3, 2025Updated last year
NISTEP / minutes
View on GitHub
議事録メタデータセット
☆12Jun 10, 2018Updated 8 years ago
mynlp / niilc-qa
View on GitHub
NIILC QA data
☆18Nov 20, 2015Updated 10 years ago
mamorlis / nlpbook
View on GitHub
「自然言語処理の教科書」サポートサイト
☆14Apr 1, 2025Updated last year
Aratako / Japanese-RP-Bench
View on GitHub
☆19Sep 29, 2024Updated last year
SatoruMuro / SegRef3D
View on GitHub
SegRef3D: AI-Powered Segmentation and Interactive Refinement for Labor-Saving 3D Reconstruction
☆18Jul 13, 2026Updated last week
ueda-keisuke / CC-CEDICT-MeCab
View on GitHub
CC-CEDICT-MeCab is a MeCab dictionary for Chinese (Mandarin) text segmentation
☆13Apr 9, 2020Updated 6 years ago
Deploy open-source AI quickly and easily - Special Bonus Offer • Ad
Runpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
okoge-kaz / moe-recipes
View on GitHub
Ongoing research training Mixture of Expert models.
☆22Sep 16, 2024Updated last year
recursal / minmodmon
View on GitHub
Mini Model Daemon
☆13Nov 9, 2024Updated last year
lighttransport / japanese-llama-experiment
View on GitHub
Japanese LLaMa experiment
☆54Dec 27, 2025Updated 6 months ago
schroneko / systemprompts
View on GitHub
☆136Jan 30, 2026Updated 5 months ago
adtech-labs / kernda
View on GitHub
Add conda activation to an IPython kernel spec
☆10Mar 12, 2019Updated 7 years ago
hotchpotch / fast-bunkai
View on GitHub
⚡Japanese sentence splitting(日本語文境界判定器), 40–250× faster via a Rust-accelerated Python library with near-perfect API compatibility with …
☆75Oct 14, 2025Updated 9 months ago
laboroai / Laboro-ParaCorpus
View on GitHub
Scripts for creating a Japanese-English parallel corpus and training NMT models
☆19Nov 9, 2021Updated 4 years ago
sbintuitions / JMTEB
View on GitHub
The evaluation scripts of JMTEB (Japanese Massive Text Embedding Benchmark)
☆93Mar 16, 2026Updated 4 months ago
colorfulscoop / sbert-ja
View on GitHub
Code to train Sentence BERT Japanese model for Hugging Face Model Hub
☆11Aug 8, 2021Updated 4 years ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
llm-jp / llm-jp-modernbert
View on GitHub
This repository contains the training and evaluation code for llm-jp-modernbert-base.
☆17Jun 17, 2025Updated last year
ku-nlp / AnnotatedFKCCorpus
View on GitHub
Annotated Fuman Kaitori Center Corpus
☆18Dec 18, 2023Updated 2 years ago
ayaco0 / paper-survey
View on GitHub
サーベイした論文をissueにゆっくりまとめる。
☆15May 15, 2024Updated 2 years ago
daac-tools / vaporetto
View on GitHub
🛥 Vaporetto: Very accelerated pointwise prediction based tokenizer
☆296Updated this week
hppRC / bert-classification-tutorial-2024
View on GitHub
【2024年版】BERTによるテキスト分類
☆30Jul 8, 2024Updated 2 years ago
kenoharada / labudy
View on GitHub
☆19Nov 12, 2025Updated 8 months ago
oshizo / JapaneseEmbeddingEval
View on GitHub
☆183Oct 9, 2024Updated last year