ndl-lab / layout-dataset
NDL-DocLデータセット(資料画像レイアウトデータセット)
☆26Updated last year
Related projects ⓘ
Alternatives and complementary repositories for layout-dataset
- デジタル化資料OCRテキスト化事業において作成されたOCR学習用データセット☆64Updated 4 months ago
- Japanese tokenizer for Transformers☆78Updated 11 months ago
- This repository has implementations of data augmentation for NLP for Japanese.☆64Updated last year
- Japanese Realistic Textual Entailment Corpus (NLP 2020, LREC 2020)☆76Updated last year
- Japanese-BPEEncoder☆39Updated 3 years ago
- 日本語CLIPモデル☆13Updated last year
- ☆19Updated last year
- 敬語変換タスクにおける評価用データセット☆20Updated last year
- Japanese synonym library☆52Updated 2 years ago
- Viterbi-based accelerated tokenizer (Python wrapper)☆40Updated 2 months ago
- Mecab + NEologd + Docker + Python3☆35Updated 2 years ago
- 図表自動抽出のプログラム(A program that automatically extracts diagrams)☆19Updated 3 years ago
- ☆18Updated last month
- IPAdic packaged for easy use from Python.☆25Updated 3 years ago
- ☆82Updated last year
- Japanese CLIP by rinna Co., Ltd.☆68Updated 11 months ago
- 【2024年版】BERTによるテキスト分類☆24Updated 4 months ago
- Japanese instruction data (日本語指示データ)☆22Updated last year
- Japanese Movie Recommendation Dialogue dataset☆27Updated 2 years ago
- GPTがYouTuberをやります☆62Updated 11 months ago
- Evidence-based Explanation Dataset (AACL-IJCNLP 2020)☆18Updated 3 years ago
- Finding all pairs of similar documents time- and memory-efficiently☆58Updated 2 years ago
- Accommodation Search Dialog Corpus (宿泊施設探索対話コーパス)☆23Updated 10 months ago
- Easily turn large English text datasets into Japanese text datasets using open LLMs.☆14Updated last week
- ☆16Updated 3 years ago
- RealPersonaChat: A Realistic Persona Chat Corpus with Interlocutors' Own Personalities☆48Updated 8 months ago
- The evaluation scripts of JMTEB (Japanese Massive Text Embedding Benchmark)☆33Updated this week
- ☆31Updated 3 months ago
- 📝 A list of pre-trained BERT models for Japanese with word/subword tokenization + vocabulary construction algorithm information☆129Updated last year
- ☆101Updated this week