kakaobrain / kortokView external linksLinks
The code and models for "An Empirical Study of Tokenization Strategies for Various Korean NLP Tasks" (AACL-IJCNLP 2020)
โ119Oct 8, 2020Updated 5 years ago
Alternatives and similar repositories for kortok
Users that are interested in kortok are comparing it to the libraries listed below
Sorting:
- A utility for storing and reading files for Korean LM training ๐พโ35Oct 15, 2025Updated 3 months ago
- ํ๊ตญ์ด ๋ฌธ์์ ๋ ธ์ด์ฆ๋ฅผ ์ถ๊ฐํฉ๋๋ค.โ27Nov 9, 2022Updated 3 years ago
- [Findings of NAACL2022] A Dog Is Passing Over The Jet? A Text-Generation Dataset for Korean Commonsense Reasoning and Evaluationโ11May 27, 2022Updated 3 years ago
- Training Transformers of Huggingface with KoNLPyโ68Aug 28, 2020Updated 5 years ago
- Character-level Korean ELECTRA Model (์์ ๋จ์ ํ๊ตญ์ด ELECTRA)โ54Jun 12, 2023Updated 2 years ago
- Finetuning Pipelineโ89Feb 25, 2022Updated 3 years ago
- BERTScore for Koreanโ80Feb 22, 2024Updated last year
- ๋ชจ๋์ ๋ง๋ญ์น ๋ฐ์ดํฐ๋ฅผ ๋ถ์์ ํธ๋ฆฌํ ํํ๋ก ๋ณํํ๋ ๊ธฐ๋ฅ์ ์ ๊ณตํฉ๋๋ค.โ11Mar 2, 2022Updated 3 years ago
- KorNLI and KorSTS: New Benchmark Datasets for Korean Natural Language Understandingโ309Jul 9, 2023Updated 2 years ago
- Open Korean NLP Dataset Curation for the Users All Around the Globeโ152Nov 18, 2023Updated 2 years ago
- Korean Visual Question Answeringโ59Feb 18, 2020Updated 5 years ago
- baikal.ai's pre-trained BERT models: descriptions and sample codesโ12Jun 24, 2021Updated 4 years ago
- ๐ Korean NLU Benchmarkโ587Jul 6, 2022Updated 3 years ago
- Subword-level Word Vector Representations for Korean (ACL 2018)โ107Oct 17, 2019Updated 6 years ago
- ๐ค Pretrained BERT model & WordPiece tokenizer trained on Korean Comments ํ๊ตญ์ด ๋๊ธ๋ก ํ๋ฆฌํธ๋ ์ด๋ํ BERT ๋ชจ๋ธ๊ณผ ๋ฐ์ดํฐ์ โ495Nov 7, 2022Updated 3 years ago
- ๊ตญ๋ด ์์ฐ์ด ์ฒ๋ฆฌ ๊ธฐ์ ์ ์ฐ๊ตฌ ๋ฐ ๊ฐ๋ฐํ๋ ์คํํธ์ ๋ชฉ๋กโ165May 10, 2020Updated 5 years ago
- ์ด์ฑ ํด์๊ธฐ based on ko-BARTโ29Mar 31, 2021Updated 4 years ago
- Pretrained ELECTRA Model for Koreanโ630Feb 19, 2024Updated last year
- Sentence Embeddings using Siamese ETRI KoBERTโ163Aug 16, 2025Updated 5 months ago
- ์ผ์ํ์ (a.k.a. ์ผ๋ฐค์ ์์ฐ์ด์ฒ๋ฆฌ ํ์)โ27Mar 31, 2021Updated 4 years ago
- ๋ฌธ์ฅ๋จ์๋ก ๋ถ์ ๋ ํ๊ตญ์ด ์ํคํผ๋์ ์ฝํผ์ค. Releases์์ ๋ค์ด๋ก๋ ๋ฐ๊ฑฐ๋ tfds-korean์ผ๋ก ์ฌ์ฉํด์ฃผ์ธ์.โ24Sep 6, 2023Updated 2 years ago
- ๐ฆ Pretrained BigBird Model for Korean (up to 4096 tokens)โ201Dec 28, 2023Updated 2 years ago
- โ11Aug 12, 2020Updated 5 years ago
- Large scale unannotated Korean corpus for unsupervised tasks. (e.g. Language modeling)โ28Aug 11, 2019Updated 6 years ago
- Korean Parallel Corpusโ147Feb 24, 2024Updated last year
- KoGPT2 on Huggingface Transformersโ33May 4, 2021Updated 4 years ago
- KoRean based BERT pre-trained models (KR-BERT) for Tensorflow and PyTorchโ210Apr 24, 2024Updated last year
- Korean version of GoEmotions Dataset ๐๐ข๐ฑโ57Jun 12, 2023Updated 2 years ago
- Parallel dataset of Korean Questions and Commandsโ60Mar 24, 2023Updated 2 years ago
- Convert Numerical Representations to Korean Pronunciationโ14Apr 20, 2020Updated 5 years ago
- Korean HateSpeech Datasetโ393Jul 18, 2020Updated 5 years ago
- #Paired Questionโ24Jun 16, 2020Updated 5 years ago
- ํ๊ตญ์ด ๋ฐ์ดํฐ ์ธํธ ๋งํฌโ900Oct 14, 2024Updated last year
- ELECTRA๊ธฐ๋ฐ ํ๊ตญ์ด ๋ํ์ฒด ์ธ์ด๋ชจ๋ธโ53Aug 4, 2021Updated 4 years ago
- 11.5๊ธฐ์ beyondBERT์ ํ ๋ก ๋ด์ฉ์ ์ ๋ฆฌํ๋ repository์ ๋๋ค.โ57Jul 2, 2020Updated 5 years ago
- ๋ฌธ์ฅ๋จ์๋ก ๋ถ์ ๋ ๋๋ฌด์ํค ๋ฐ์ดํฐ์ . Releases์์ ๋ค์ด๋ก๋ ๋ฐ๊ฑฐ๋, tfds-korean์ ํตํด ๋ค์ด๋ก๋ ๋ฐ์ผ์ธ์.โ19Jun 16, 2021Updated 4 years ago
- KSS: Korean String processing Suiteโ468Nov 13, 2025Updated 3 months ago
- KOLD: Korean Offensive Language Datasetโ81Nov 13, 2022Updated 3 years ago
- ๐ค ์ต์ํ์ ์ธํ ์ผ๋ก LM์ ํ์ตํ๊ธฐ ์ํ ์ํ์ฝ๋โ59May 23, 2023Updated 2 years ago