cynthia/kosentences

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/cynthia/kosentences)

cynthia / kosentences

Large scale unannotated Korean corpus for unsupervised tasks. (e.g. Language modeling)

☆28

Alternatives and similar repositories for kosentences

Users that are interested in kosentences are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

warnikchow / prosem
View on GitHub
Prosody-semantics Interface in Seoul Korean
☆12Oct 9, 2020Updated 5 years ago
YongWookHa / kor-text-preprocess
View on GitHub
Korean text data preprocess toolkit for NLP
☆18Jun 11, 2019Updated 7 years ago
songys / single_turn_dialogue
View on GitHub
사전에서 대화 예문만 추출한 데이터
☆16Apr 24, 2023Updated 3 years ago
SungjoonPark / DeepNLP2
View on GitHub
Deep NLP 2 (2019.3-5)
☆10Feb 19, 2019Updated 7 years ago
fromSun2Moon / KoreanF2I
View on GitHub
한국어 높임말 교정
☆26Dec 31, 2022Updated 3 years ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
EleutherAI / hae-rae
View on GitHub
☆33Aug 30, 2023Updated 2 years ago
ko-nlp / Open-korean-corpora
View on GitHub
Open Korean NLP Dataset Curation for the Users All Around the Globe
☆158Jun 17, 2026Updated last month
jeongukjae / korean-wikipedia-corpus
View on GitHub
문장단위로 분절된 한국어 위키피디아 코퍼스. Releases에서 다운로드 받거나 tfds-korean으로 사용해주세요.
☆24Sep 6, 2023Updated 2 years ago
MrBananaHuman / KalBert
View on GitHub
Korean ALBERT
☆46Nov 11, 2019Updated 6 years ago
Kyubyong / KoParadigm
View on GitHub
KoParadigm: Korean Inflectional Paradigm Generator
☆60Nov 23, 2022Updated 3 years ago
openkorpos / model-mecab
View on GitHub
MeCab model trained with OpenKorPos.
☆23Jun 19, 2022Updated 4 years ago
jeongukjae / tfds-korean
View on GitHub
A collection of Korean Text Datasets ready to use using Tensorflow-Datasets.
☆20Jun 8, 2022Updated 4 years ago
moon1ite / yajatime
View on GitHub
야자타임 (a.k.a. 야밤의 자연어처리 타임)
☆27Mar 31, 2021Updated 5 years ago
kakaobrain / kortok
View on GitHub
The code and models for "An Empirical Study of Tokenization Strategies for Various Korean NLP Tasks" (AACL-IJCNLP 2020)
☆119Oct 8, 2020Updated 5 years ago
Virtual machines for every use case on DigitalOcean • Ad
Get dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
warnikchow / kosp2e
View on GitHub
Korean Speech to English Translation Corpus
☆45Sep 3, 2021Updated 4 years ago
songhyunje / kma
View on GitHub
Korean morphological analyzer
☆29Dec 22, 2019Updated 6 years ago
lovit / huggingface_konlpy
View on GitHub
Training Transformers of Huggingface with KoNLPy
☆68Aug 28, 2020Updated 5 years ago
jeongukjae / namuwiki-corpus
View on GitHub
문장단위로 분절된 나무위키 데이터셋. Releases에서 다운로드 받거나, tfds-korean을 통해 다운로드 받으세요.
☆19Jun 16, 2021Updated 5 years ago
hkjeon13 / noising-korean
View on GitHub
한국어 문서에 노이즈를 추가합니다.
☆27Nov 9, 2022Updated 3 years ago
jeongukjae / korean-spacing-model
View on GitHub
한국어 문장 띄어쓰기(삭제/추가) 모델입니다. 데이터 준비 후 직접 학습이 가능하도록 작성하였습니다.
☆56Jul 11, 2022Updated 4 years ago
moon1ite / koco
View on GitHub
Easy installer of kocohub dataset
☆24May 31, 2020Updated 6 years ago
yseokchoi / SejongTree2Dependency
View on GitHub
세종 구문 분석 말뭉치의 의존 구문 구조로의 변환 도구
☆10Sep 7, 2018Updated 7 years ago
KPFBERT / kpfbertsum
View on GitHub
☆15Nov 28, 2021Updated 4 years ago
Virtual machines for every use case on DigitalOcean • Ad
Get dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
jungyeul / korean-parallel-corpora
View on GitHub
Korean Parallel Corpus
☆147Feb 24, 2024Updated 2 years ago
warnikchow / dlk2nlp
View on GitHub
Day-by-day line-by-line Keras-based Korean NLP
☆92Nov 21, 2022Updated 3 years ago
lovit / textmining_dataset
View on GitHub
텍스트마이닝 실습을 위한 데이터셋 핸들러
☆38Dec 6, 2019Updated 6 years ago
UniversalDependencies / UD_Korean-Kaist
View on GitHub
Data from KAIST (a Korean treebank).
☆19May 6, 2026Updated 2 months ago
forkonlp / newspaper
View on GitHub
대부분의 신문사 뉴스를 수집하는 것을 목적으로 하는 크롤러 제작 프로젝트
☆11Jul 29, 2019Updated 6 years ago
dsindex / ntagger
View on GitHub
reference pytorch code for named entity tagging
☆87Oct 18, 2024Updated last year
modulabs / beyondBERT
View on GitHub
11.5기의 beyondBERT의 토론 내용을 정리하는 repository입니다.
☆57Jul 2, 2020Updated 6 years ago
emorynlp / ud-korean
View on GitHub
Universal Dependency Treebanks in Korean
☆39Dec 19, 2021Updated 4 years ago
haven-jeon / KoWordSpacing
View on GitHub
Korean Word Spacing with RNN.
☆21Sep 23, 2017Updated 8 years ago
Managed Database hosting by DigitalOcean • Ad
PostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
Data-Intelligence-Lab / DEFT-korean-alpaca
View on GitHub
☆23Oct 30, 2023Updated 2 years ago
monologg / DistilKoBERT
View on GitHub
Distillation of KoBERT from SKTBrain (Lightweight KoBERT)
☆200Sep 6, 2023Updated 2 years ago
eagle705 / korean-ner-cnn-bilstm
View on GitHub
CNN+BiLSTM 기반 한국어 개체명 인식기입니다
☆57Nov 26, 2019Updated 6 years ago
SungjoonPark / KoreanWordVectors
View on GitHub
Subword-level Word Vector Representations for Korean (ACL 2018)
☆107Oct 17, 2019Updated 6 years ago
jinmang2 / AdvancedTransformers
View on GitHub
⛩ All about Korean Transformers (information and tutorial)
☆17Jun 21, 2022Updated 4 years ago
ModuNLP / weekly-meeting
View on GitHub
매주 목요일, 20:00 모임
☆16Jul 24, 2020Updated 6 years ago
likejazz / korean-sentence-splitter
View on GitHub
Split Korean text into sentences using heuristic algorithm.
☆216Dec 24, 2020Updated 5 years ago