Data processing system for polyglot
☆93Sep 5, 2023Updated 2 years ago
Alternatives and similar repositories for dps
Users that are interested in dps are comparing it to the libraries listed below
Sorting:
- ☆19Sep 20, 2022Updated 3 years ago
- Beyond LM: How can language model go forward in the future?☆15Apr 30, 2023Updated 2 years ago
- kogpt를 oslo로 파인튜닝하는 예제.☆23Aug 26, 2022Updated 3 years ago
- ☆36Oct 4, 2023Updated 2 years ago
- 🤗 최소한의 세팅으로 LM을 학습하기 위한 샘플코드☆59May 23, 2023Updated 2 years ago
- Korean Named Entity Corpus☆25May 12, 2023Updated 2 years ago
- #인권코퍼스☆31Oct 6, 2023Updated 2 years ago
- OSLO: Open Source for Large-scale Optimization☆175Sep 9, 2023Updated 2 years ago
- Yet another python binding for mecab-ko☆88May 16, 2023Updated 2 years ago
- Pecab: Pure python Korean morpheme analyzer based on Mecab☆172Apr 27, 2024Updated last year
- Polyglot: Large Language Models of Well-balanced Competence in Multi-languages☆484Aug 22, 2023Updated 2 years ago
- ☆34Feb 27, 2024Updated 2 years ago
- MeCab model trained with OpenKorPos.☆23Jun 19, 2022Updated 3 years ago
- Difference-based Contrastive Learning for Korean Sentence Embeddings☆23Updated this week
- 언어모델을 학습하기 위한 공개 한국어 instruction dataset들을 모아두었습니다.☆19Jul 16, 2023Updated 2 years ago
- CareCall for Seniors: Role Specified Open-Domain Dialogue dataset generated by leveraging LLMs (NAACL 2022).☆60May 3, 2022Updated 3 years ago
- data related codebase for polyglot project☆19Mar 30, 2023Updated 2 years ago
- KOLD: Korean Offensive Language Dataset☆81Nov 13, 2022Updated 3 years ago
- ☆197May 22, 2023Updated 2 years ago
- A utility for storing and reading files for Korean LM training 💾☆35Oct 15, 2025Updated 4 months ago
- 42dot LLM consists of a pre-trained language model, 42dot LLM-PLM, and a fine-tuned model, 42dot LLM-SFT, which is trained to respond to …☆131Mar 7, 2024Updated last year
- Megatron LM 11B on Huggingface Transformers☆27Jul 11, 2021Updated 4 years ago
- Abstractive summarization using Bert2Bert framework.☆31Dec 5, 2020Updated 5 years ago
- ☆23Oct 30, 2023Updated 2 years ago
- This repo is for Korean wiki table question answering datasets described in the paper of Korean-Specific Dataset for Table Question Answe…☆91Oct 22, 2024Updated last year
- This is project for korean auto spacing☆12Aug 3, 2020Updated 5 years ago
- Bias, Hate classification with KoELECTRA 👿☆27Jun 12, 2023Updated 2 years ago
- Korean Math Word Problems☆59Jan 14, 2022Updated 4 years ago
- [KO-Platy🥮] Korean-Open-platypus를 활용하여 llama-2-ko를 fine-tuning한 KO-platypus model☆73Aug 24, 2025Updated 6 months ago
- Benchmark in Korean Context☆138Sep 26, 2023Updated 2 years ago
- 모두의 말뭉치 데이터를 분석에 편리한 형태로 변환하는 기능을 제공합니다.☆11Mar 2, 2022Updated 3 years ago
- ☆11Oct 3, 2021Updated 4 years ago
- ☆19Oct 24, 2023Updated 2 years ago
- Easy Language Model Pretraining leveraging Huggingface's Transformers and Datasets☆130Nov 12, 2022Updated 3 years ago
- Dataset of Korean Threatening Conversations☆72Nov 1, 2022Updated 3 years ago
- StrategyQA 데이터 세트 번역☆23Apr 12, 2024Updated last year
- Machine Generated Captions for Best Artworks☆22Sep 21, 2022Updated 3 years ago
- Korean Moview Review Emotion (KMRE) Dataset☆21Sep 7, 2020Updated 5 years ago
- KSS: Korean String processing Suite☆468Nov 13, 2025Updated 3 months ago