ndl-lab / hiragana_mojigazo
文字画像データセット(平仮名73文字版)
☆14Updated 4 years ago
Related projects: ⓘ
- デジタル化資料OCRテキスト化事業において作成されたOCR学習用データセット☆63Updated 2 months ago
- Namelti : The automatic transcription generation library for person name in Katakana☆20Updated last year
- python版日本語意味役割付与システム(ASA)☆23Updated last year
- Accommodation Search Dialog Corpus (宿泊施設探索対話コーパス)☆23Updated 8 months ago
- Japanese Realistic Textual Entailment Corpus (NLP 2020, LREC 2020)☆75Updated last year
- NDL-DocLデータセット(資料画像レイアウトデータセット)☆24Updated last year
- Japanese tokenizer for Transformers☆77Updated 9 months ago
- Mecab + NEologd + Docker + Python3☆35Updated 2 years ago
- Viterbi-based accelerated tokenizer (Python wrapper)☆39Updated 2 weeks ago
- おーぷん2ちゃんねるをクロールして作成した対話コーパス☆93Updated 3 years ago
- Flatten nested iterable object for Python (Pure-Python implementation)☆28Updated 4 years ago
- Japanese synonym library☆51Updated 2 years ago
- This repository has implementations of data augmentation for NLP for Japanese.☆63Updated last year
- ☆26Updated last month
- ☆19Updated last year
- Japanese-BPEEncoder☆39Updated 3 years ago
- Wikipediaから作成した日本語名寄せデータセット☆34Updated 4 years ago
- ☆16Updated 5 years ago
- Evidence-based Explanation Dataset (AACL-IJCNLP 2020)☆18Updated 3 years ago
- Wikipediaを用いた日本語の固有表現抽出データセット☆132Updated last year
- This is the repository for TRF (text readability features) publication.☆39Updated 5 years ago
- Laboro BERT Japanese: Japanese BERT Pre-Trained With Web-Corpus☆72Updated 2 years ago
- ☆70Updated last year
- text-only archives of www.aozora.gr.jp☆74Updated last year
- A paraphrase database for Japanese text simplification☆32Updated 7 years ago
- japanese sentence segmentation library for python☆65Updated last year
- JMultiWOZ: A Large-Scale Japanese Multi-Domain Task-Oriented Dialogue Dataset☆21Updated 5 months ago
- デジタル化資料から作成したOCRテキストデータのngram頻度統計情報のデータセット☆13Updated last year
- Yet another sentence-level tokenizer for the Japanese text☆21Updated last year
- ☆71Updated 5 years ago