tunib-ai/large-scale-lm-tutorials

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/tunib-ai/large-scale-lm-tutorials)

tunib-ai / large-scale-lm-tutorials

Large-scale language modeling tutorials with PyTorch

☆293

Alternatives and similar repositories for large-scale-lm-tutorials

Users that are interested in large-scale-lm-tutorials are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

tunib-ai / oslo
View on GitHub
OSLO: Open Source framework for Large-scale model Optimization
☆309Aug 25, 2022Updated 3 years ago
EleutherAI / polyglot
View on GitHub
Polyglot: Large Language Models of Well-balanced Competence in Multi-languages
☆487Aug 22, 2023Updated 2 years ago
lassl / lassl
View on GitHub
Easy Language Model Pretraining leveraging Huggingface's Transformers and Datasets
☆130Nov 12, 2022Updated 3 years ago
jiphyeonjeon / season2
View on GitHub
Jiphyeonjeon Season 2
☆117May 16, 2022Updated 4 years ago
seopbo / nlp_tutorials
View on GitHub
huggingface를 이용하여 downstream task 수행하기
☆62Dec 28, 2021Updated 4 years ago
AI Agents on DigitalOcean Gradient AI Platform • Ad
Build production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
tunib-ai / parallelformers
View on GitHub
Parallelformers: An Efficient Model Parallelization Toolkit for Deployment
☆788Apr 24, 2023Updated 3 years ago
kakaobrain / kortok
View on GitHub
The code and models for "An Empirical Study of Tokenization Strategies for Various Korean NLP Tasks" (AACL-IJCNLP 2020)
☆119Oct 8, 2020Updated 5 years ago
KLUE-benchmark / KLUE-baseline
View on GitHub
Finetuning Pipeline
☆89Feb 25, 2022Updated 4 years ago
yukyunglee / Transformer_Survey_Study
View on GitHub
"A survey of Transformer" paper study 👩🏻‍💻🧑🏻‍💻 KoreaUniv. DSBA Lab
☆184Nov 4, 2021Updated 4 years ago
EleutherAI / oslo
View on GitHub
OSLO: Open Source for Large-scale Optimization
☆175Sep 9, 2023Updated 2 years ago
Huffon / klue-transformers-tutorial
View on GitHub
KLUE 데이터를 활용한 HuggingFace Transformers 튜토리얼
☆129Jun 28, 2021Updated 5 years ago
monologg / ko_lm_dataformat
View on GitHub
A utility for storing and reading files for Korean LM training 💾
☆35Updated this week
KLUE-benchmark / KLUE
View on GitHub
📖 Korean NLU Benchmark
☆602Jun 30, 2026Updated 3 weeks ago
monologg / KoBigBird
View on GitHub
🦅 Pretrained BigBird Model for Korean (up to 4096 tokens)
☆202Dec 28, 2023Updated 2 years ago
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
tunib-ai / tunib-electra
View on GitHub
Korean-English Bilingual Electra Models
☆110Nov 22, 2021Updated 4 years ago
boychaboy / KOLD
View on GitHub
KOLD: Korean Offensive Language Dataset
☆83Nov 13, 2022Updated 3 years ago
tunib-ai / DKTC
View on GitHub
Dataset of Korean Threatening Conversations
☆74Nov 1, 2022Updated 3 years ago
hyunwoongko / kobart-transformers
View on GitHub
Kobart model on Huggingface transformers
☆64Feb 15, 2022Updated 4 years ago
LG-NLP / KorWikiTableQuestions
View on GitHub
This repo is for Korean wiki table question answering datasets described in the paper of Korean-Specific Dataset for Table Question Answe…
☆91Oct 22, 2024Updated last year
jungwoo-ha / WeeklyArxivTalk
View on GitHub
[Zoom & Facebook Live] Weekly AI Arxiv 시즌2
☆961Aug 27, 2023Updated 2 years ago
smilegate-ai / HuLiC
View on GitHub
☆93Mar 3, 2022Updated 4 years ago
VumBleBot / odqa_baseline_code
View on GitHub
Baseline code for Korean open domain question answering(ODQA)
☆76Aug 11, 2023Updated 2 years ago
hyunwoongko / pecab
View on GitHub
Pecab: Pure python Korean morpheme analyzer based on Mecab
☆172Apr 27, 2024Updated 2 years ago
Virtual machines for every use case on DigitalOcean • Ad
Get dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
hyunwoongko / bigdata-lecture
View on GitHub
2020 CBNU summer vacation data campus machine learning lecture materials
☆19Nov 21, 2020Updated 5 years ago
sooftware / Korean-PLM
View on GitHub
List of Korean pre-trained language models.
☆189Aug 31, 2023Updated 2 years ago
hyunwoongko / kss
View on GitHub
KSS: Korean String processing Suite
☆471Nov 13, 2025Updated 8 months ago
nlpai-lab / Korean-CommonGen
View on GitHub
[Findings of NAACL2022] A Dog Is Passing Over The Jet? A Text-Generation Dataset for Korean Commonsense Reasoning and Evaluation
☆11May 27, 2022Updated 4 years ago
hyunwoongko / python-mecab-kor
View on GitHub
Yet another python binding for mecab-ko
☆88May 16, 2023Updated 3 years ago
sooftware / nlp-tasks
View on GitHub
Natural Language Processing Tasks and Examples.
☆62Aug 17, 2022Updated 3 years ago
monologg / KoELECTRA
View on GitHub
Pretrained ELECTRA Model for Korean
☆637Feb 19, 2024Updated 2 years ago
smilegate-ai / korean_unsmile_dataset
View on GitHub
☆446Apr 8, 2022Updated 4 years ago
AIRC-KETI / ke-t5
View on GitHub
☆198May 22, 2023Updated 3 years ago
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
naver-ai / carecall-corpus
View on GitHub
CareCall for Seniors: Role Specified Open-Domain Dialogue dataset generated by leveraging LLMs (NAACL 2022).
☆62May 3, 2022Updated 4 years ago
HeegyuKim / open-korean-instructions
View on GitHub
언어모델을 학습하기 위한 공개 한국어 instruction dataset들을 모아두었습니다.
☆469Apr 13, 2025Updated last year
kakaobrain / pororo
View on GitHub
PORORO: Platform Of neuRal mOdels for natuRal language prOcessing
☆1,303Mar 23, 2022Updated 4 years ago
cosmoquester / 2021-dialogue-summary-competition
View on GitHub
[2021 훈민정음 한국어 음성•자연어 인공지능 경진대회] 대화요약 부문 알라꿍달라꿍 팀의 대화요약 학습 및 추론 코드를 공유하기 위한 레포입니다.
☆127Jul 11, 2022Updated 4 years ago
jiphyeonjeon / season1
View on GitHub
Jiphyeonjeon Season 1
☆176May 19, 2021Updated 5 years ago
toriving / KoEDA
View on GitHub
Korean Easy Data Augmentation
☆91Sep 30, 2021Updated 4 years ago
hyunwoongko / nlp-datasets
View on GitHub
Curation note of NLP datasets
☆99Dec 6, 2022Updated 3 years ago