ko-nlp/moducorpus-sanitizer

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/ko-nlp/moducorpus-sanitizer)

ko-nlp / moducorpus-sanitizer

모두의 말뭉치 데이터를 분석에 편리한 형태로 변환하는 기능을 제공합니다.

☆11

Alternatives and similar repositories for moducorpus-sanitizer

Users that are interested in moducorpus-sanitizer are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

songys / 2021Langcon
View on GitHub
☆11Oct 3, 2021Updated 4 years ago
hamanlp / hama-py
View on GitHub
🦛 파이썬 한글 처리 라이브러리. Python Korean Morphological Analyzer
☆19Feb 4, 2025Updated last year
passing2961 / EmoNSMC
View on GitHub
Korean large emotion labeled dataset (EmoNSMC)
☆14Mar 5, 2020Updated 6 years ago
baikalai / baikal-bert
View on GitHub
baikal.ai's pre-trained BERT models: descriptions and sample codes
☆12Jun 24, 2021Updated 5 years ago
korean-named-entity / konec
View on GitHub
Korean Named Entity Corpus
☆25May 12, 2023Updated 3 years ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
openkorpos / model-mecab
View on GitHub
MeCab model trained with OpenKorPos.
☆23Jun 19, 2022Updated 4 years ago
hyunwoongko / beyond-lm
View on GitHub
Beyond LM: How can language model go forward in the future?
☆15Apr 30, 2023Updated 3 years ago
nlpai-lab / Korean-CommonGen
View on GitHub
[Findings of NAACL2022] A Dog Is Passing Over The Jet? A Text-Generation Dataset for Korean Commonsense Reasoning and Evaluation
☆11May 27, 2022Updated 4 years ago
BitnaKeum / Web_Crawler
View on GitHub
나무위키, 위키피디아, 다음블로그, 티스토리, 유튜브, 네이트판 크롤러
☆13Feb 20, 2026Updated 5 months ago
formidable-stella / ShareGPT-translation
View on GitHub
☆21May 24, 2023Updated 3 years ago
monologg / ko_lm_dataformat
View on GitHub
A utility for storing and reading files for Korean LM training 💾
☆35Jul 18, 2026Updated 2 weeks ago
jason9693 / polyglot-finetuning-oslo
View on GitHub
☆19Sep 20, 2022Updated 3 years ago
triplet02 / KoNPron
View on GitHub
Convert Numerical Representations to Korean Pronunciation
☆14Apr 20, 2020Updated 6 years ago
korean-named-entity / konne
View on GitHub
Korean Nested Named Entity Corpus
☆20May 13, 2023Updated 3 years ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
JoungheeKim / kor-spacing
View on GitHub
This is project for korean auto spacing
☆12Aug 3, 2020Updated 5 years ago
jinmang2 / DOOLY
View on GitHub
🦕 A library that handles everything with 🤗 and supports batching to models in PORORO
☆37Jun 16, 2022Updated 4 years ago
korean-named-entity / konne-prep
View on GitHub
☆19Jan 29, 2023Updated 3 years ago
j-min / korean-parallel-corpora
View on GitHub
Korean Parallel Corpus
☆11Nov 27, 2014Updated 11 years ago
jwkanggist / SSL-narratives-NLP-1
View on GitHub
거꾸로 읽는 self-supervised learning in NLP
☆27Oct 30, 2022Updated 3 years ago
seopbo / nlp_tutorials
View on GitHub
huggingface를 이용하여 downstream task 수행하기
☆62Dec 28, 2021Updated 4 years ago
passing2961 / KMRE
View on GitHub
Korean Moview Review Emotion (KMRE) Dataset
☆21Sep 7, 2020Updated 5 years ago
facebookresearch / ketod
View on GitHub
KETOD Knowledge-Enriched Task-Oriented Dialogue
☆33Jan 4, 2023Updated 3 years ago
jason9693 / oslo-kogpt-finetunig
View on GitHub
kogpt를 oslo로 파인튜닝하는 예제.
☆23Aug 26, 2022Updated 3 years ago
Virtual machines for every use case on DigitalOcean • Ad
Get dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
lovit / namuwikitext
View on GitHub
Wikitext format dataset of Namuwiki (Most famous Korean wikipedia)
☆53Oct 25, 2020Updated 5 years ago
lovit / levenshtein_finder
View on GitHub
Similar string search in Levenshtein distance
☆21Jun 19, 2021Updated 5 years ago
smothly / bad-word-detection
View on GitHub
비속어 탐지 모델
☆16Dec 19, 2019Updated 6 years ago
tunib-ai / tunib-electra
View on GitHub
Korean-English Bilingual Electra Models
☆110Nov 22, 2021Updated 4 years ago
MrBananaHuman / KoreanCharacterBert
View on GitHub
Korean BERT model using character tokenizer
☆27Apr 8, 2021Updated 5 years ago
momozzing / kiosk_bot
View on GitHub
KoGPT-2 finetuning Based Kiosk chatbot
☆12Dec 12, 2023Updated 2 years ago
MrBananaHuman / PangyoCorpora
View on GitHub
☆38Oct 4, 2023Updated 2 years ago
QuoQA-NLP / Ko-conceptual-captions
View on GitHub
Google's Conceptual Captions Dataset translated into Korean
☆23Aug 28, 2022Updated 3 years ago
tunib-ai / artwork_captions
View on GitHub
Machine Generated Captions for Best Artworks
☆22Sep 21, 2022Updated 3 years ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
hyunwoongko / python-mecab-kor
View on GitHub
Yet another python binding for mecab-ko
☆88May 16, 2023Updated 3 years ago
KLUE-benchmark / KLUE-baseline
View on GitHub
Finetuning Pipeline
☆89Feb 25, 2022Updated 4 years ago
MrBananaHuman / open-korean-instructions
View on GitHub
언어모델을 학습하기 위한 공개 한국어 instruction dataset들을 모아두었습니다.
☆19Jul 16, 2023Updated 3 years ago
boychaboy / KOLD
View on GitHub
KOLD: Korean Offensive Language Dataset
☆83Nov 13, 2022Updated 3 years ago
dobby-seo / korean-speech-recognition-quartznet
View on GitHub
Jasper 기반 양자화된 모델인 Quartznet 한국어 음성인식
☆22Jul 21, 2021Updated 5 years ago
tunib-ai / KMWP
View on GitHub
Korean Math Word Problems
☆59Jan 14, 2022Updated 4 years ago
lih0905 / WSD_kor
View on GitHub
한국어 어휘 의미 분석 모델
☆25Apr 4, 2022Updated 4 years ago