line / LINE-DistilBERT-JapaneseLinks

DistilBERT model pre-trained on 131 GB of Japanese web text. The teacher model is BERT-base that built in-house at LINE.

☆45

Alternatives and similar repositories for LINE-DistilBERT-Japanese

Users that are interested in LINE-DistilBERT-Japanese are comparing it to the libraries listed below

Sorting:

kunishou / databricks-dolly-15k-ja
☆86Updated 2 years ago
p-geon / ja-tokenizer-docker-py
Mecab + NEologd + Docker + Python3
☆36Updated 3 years ago
daac-tools / python-vibrato
Viterbi-based accelerated tokenizer (Python wrapper)
☆43Updated last year
zomysan / alkana.py
A tool to get the katakana reading of an alphabetical string.
☆33Updated 4 years ago
MosasoM / inappropriate-words-ja
日本語における不適切表現を収集します。自然言語処理の時のデータクリーニング用等に使えると思います。
☆196Updated 3 years ago
yagays / ja-timex
自然言語で書かれた時間情報表現を抽出/規格化するルールベースの解析器
☆140Updated 7 months ago
nu-dialogue / jmultiwoz
JMultiWOZ: A Large-Scale Japanese Multi-Domain Task-Oriented Dialogue Dataset, LREC-COLING 2024
☆25Updated last year
pfnet-research / pfgen-bench
Preferred Generation Benchmark
☆85Updated last month
ndl-lab / pdmocrdataset-part1
デジタル化資料OCRテキスト化事業において作成されたOCR学習用データセット
☆74Updated last year
1never / open2ch-dialogue-corpus
おーぷん2ちゃんねるをクロールして作成した対話コーパス
☆99Updated 4 years ago
yagays / emoji-ja
📙UNICODE絵文字の日本語読み/キーワード/分類辞書📙
☆81Updated 7 months ago
tkm2261 / kaggler_ja_slack_archiver
Slack log archive system and veiwer on GAE
☆42Updated 6 years ago
tanreinama / GPTSAN
General-purpose Swich transformer based Japanese language model
☆118Updated 2 years ago
megagonlabs / vecscan
☆51Updated 2 years ago
karakuri-ai / gptuber-by-langchain
GPTがYouTuberをやります
☆63Updated last year
hppRC / bert-classification-tutorial-2024
【2024年版】BERTによるテキスト分類
☆29Updated last year
smartnews-smri / house-of-councillors
参議院の公式ウェブサイトから議案、議員、会派、質問主意書をデータベース化しました。商用・非商用を問わず、自由にデータのダウンロードや検索が可能です。
☆104Updated this week
shi3z / alpaca_ja
alpacaデータセットを日本語化したものです
☆86Updated 2 years ago
hiroshi-matsuda-rit / NLP2024-tutorial-3
NLP2024 チュートリアル３作って学ぶ日本語大規模言語モデル - 環境構築手順とソースコード / NLP2024 Tutorial 3: Practicing how to build a Japanese large-scale language model - E…
☆112Updated last year
sotokisehiro / chatux-server-llm
☆32Updated last year
megagonlabs / asdc
Accommodation Search Dialog Corpus (宿泊施設探索対話コーパス)
☆25Updated last year
karaage0703 / ChatLLM
Test script of LLMs
☆55Updated last year
shirowanisan / tsukuyomichan-talksoft
AI Talksoft of Tsukuyomichan
☆41Updated 2 years ago
BandaiNamcoResearchInc / DistilBERT-base-jp
☆161Updated 5 years ago
remdis / remdis
The Remdis toolkit: Building advanced real-time multimodal dialogue systems with incremental processing and large language models
☆100Updated 4 months ago
ttizze / BabyDORA
☆100Updated 10 months ago
DigitalNatureGroup / Remote_Voice_Recognition
リモートミーティングでの音声認識の活用事例
☆60Updated 3 years ago
llm-jp / llm-jp-tokenizer
☆42Updated last month
smartnews-smri / house-of-representatives
衆議院の公式ウェブサイトから国会に提出された議案をデータベース化しました。商用・非商用を問わず、自由にデータのダウンロードや検索が可能です。
☆173Updated this week
shi3z / speech-to-speech-japanese
☆40Updated last year