stockmarkteam/ner-wikipedia-dataset

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/stockmarkteam/ner-wikipedia-dataset)

stockmarkteam / ner-wikipedia-dataset

Wikipediaを用いた日本語の固有表現抽出データセット

☆143

Alternatives and similar repositories for ner-wikipedia-dataset

Users that are interested in ner-wikipedia-dataset are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

megagonlabs / UD_Japanese-GSD
View on GitHub
Japanese data from the Google UDT 2.0.
☆28Mar 24, 2023Updated 3 years ago
yahoojapan / JGLUE
View on GitHub
JGLUE: Japanese General Language Understanding Evaluation
☆346Mar 31, 2025Updated last year
WorksApplications / SudachiTra
View on GitHub
Japanese tokenizer for Transformers
☆81Dec 15, 2023Updated 2 years ago
kajyuuen / funer
View on GitHub
Funer is Rule based Named Entity Recognition tool.
☆22Apr 21, 2022Updated 4 years ago
ken11 / bert-japanese-ner-finetuning
View on GitHub
☆11Jun 19, 2022Updated 4 years ago
End-to-end encrypted cloud storage - Proton Drive • Ad
Special offer: 40% Off Yearly / 80% Off First Month. Protect your most important files, photos, and documents from prying eyes.
wwwcojp / ja_sentence_segmenter
View on GitHub
japanese sentence segmentation library for python
☆76Updated this week
WorksApplications / chikkarpy
View on GitHub
Japanese synonym library
☆55Feb 7, 2022Updated 4 years ago
megagonlabs / jrte-corpus
View on GitHub
Japanese Realistic Textual Entailment Corpus (NLP 2020, LREC 2020)
☆77Jun 23, 2023Updated 3 years ago
cl-tohoku / AIO2_DPR_baseline
View on GitHub
https://www.nlp.ecei.tohoku.ac.jp/projects/aio/
☆16Aug 4, 2022Updated 3 years ago
kajyuuen / daaja
View on GitHub
This repository has implementations of data augmentation for NLP for Japanese.
☆64Feb 16, 2023Updated 3 years ago
daac-tools / vaporetto
View on GitHub
🛥 Vaporetto: Very accelerated pointwise prediction based tokenizer
☆297Jul 20, 2026Updated last week
Hironsan / IOB2Corpus
View on GitHub
Japanese IOB2 tagged corpus for Named Entity Recognition.
☆61Feb 25, 2020Updated 6 years ago
WorksApplications / chiVe
View on GitHub
Japanese word embedding with Sudachi and NWJC 🌿
☆177Mar 1, 2024Updated 2 years ago
lighttransport / jagger-python
View on GitHub
Python binding for Jagger(C++ implementation of Pattern-based Japanese Morphological Analyzer)
☆13Dec 16, 2025Updated 7 months ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
aiishii / JEMHopQA
View on GitHub
☆30Apr 10, 2025Updated last year
himkt / awesome-bert-japanese
View on GitHub
📝 A list of pre-trained BERT models for Japanese with word/subword tokenization + vocabulary construction algorithm information
☆132Mar 15, 2023Updated 3 years ago
SkelterLabsInc / JaQuAD
View on GitHub
JaQuAD: Japanese Question Answering Dataset for Machine Reading Comprehension (2022, Skelter Labs)
☆111Mar 2, 2022Updated 4 years ago
ku-nlp / kwja
View on GitHub
An integrated Japanese analyzer based on foundation models
☆145Jul 18, 2026Updated last week
cl-tohoku / JAQKET_baseline
View on GitHub
☆16Oct 11, 2021Updated 4 years ago
WorksApplications / uzushio
View on GitHub
☆24Mar 18, 2026Updated 4 months ago
chakki-works / Japanese-Company-Lexicon
View on GitHub
☆99Jul 23, 2023Updated 3 years ago
HojiChar / HojiChar
View on GitHub
The robust text processing pipeline framework enabling customizable, efficient, and metric-logged text preprocessing.
☆128Jul 17, 2026Updated last week
yagays / nayose-wikipedia-ja
View on GitHub
Wikipediaから作成した日本語名寄せデータセット
☆35Mar 10, 2020Updated 6 years ago
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
sociocom / MedNER-J
View on GitHub
Latest version of MedEX/J (Japanese disease name extractor)
☆18May 17, 2022Updated 4 years ago
megagonlabs / bunkai
View on GitHub
Sentence boundary disambiguation tool for Japanese texts (日本語文境界判定器)
☆200Mar 26, 2024Updated 2 years ago
1never / open2ch-dialogue-corpus
View on GitHub
おーぷん2ちゃんねるをクロールして作成した対話コーパス
☆101Jun 6, 2021Updated 5 years ago
cl-tohoku / bert-japanese
View on GitHub
BERT models for Japanese text.
☆551Mar 23, 2024Updated 2 years ago
singletongue / wikipedia-utils
View on GitHub
Utility scripts for preprocessing Wikipedia texts for NLP
☆78Apr 9, 2024Updated 2 years ago
ujiuji1259 / shinra-attribute-extraction
View on GitHub
☆11Sep 7, 2021Updated 4 years ago
chakki-works / chABSA-dataset
View on GitHub
chakki's Aspect-Based Sentiment Analysis dataset
☆142Feb 25, 2022Updated 4 years ago
megagonlabs / ginza
View on GitHub
A Japanese NLP Library using spaCy as framework based on Universal Dependencies
☆865Jul 10, 2026Updated 2 weeks ago
oshizo / JapaneseEmbeddingEval
View on GitHub
☆183Oct 9, 2024Updated last year
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
retarfi / language-pretraining
View on GitHub
Pre-training Language Models for Japanese
☆50Jul 2, 2023Updated 3 years ago
ikegami-yukino / neologdn
View on GitHub
Japanese text normalizer for mecab-neologd
☆289May 6, 2026Updated 2 months ago
himkt / konoha
View on GitHub
🌿 An easy-to-use Japanese Text Processing tool, which makes it possible to switch tokenizers with small changes of code.
☆263Jul 19, 2026Updated last week
hppRC / simple-simcse-ja
View on GitHub
Exploring Japanese SimCSE
☆69Oct 31, 2023Updated 2 years ago
retrieva / python_stm
View on GitHub
☆15Feb 7, 2020Updated 6 years ago
ymym3412 / acl-papers
View on GitHub
paper summary of Association for Computational Linguistics
☆185Sep 16, 2019Updated 6 years ago
verypluming / JaNLI
View on GitHub
☆17May 31, 2023Updated 3 years ago