π°π· Korean LLM Datasets | Pre-training, SFT, DPO, RLHF, CoT | νκ΅μ΄ LLM λ°μ΄ν°μ
νλ μ΄μ
β41Jan 20, 2026Updated 4 months ago
Alternatives and similar repositories for LLM-Ko-Datasets
Users that are interested in LLM-Ko-Datasets are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Awesome-SLM: a curated list of Small Language Modelβ30Jun 24, 2024Updated last year
- β12Oct 3, 2024Updated last year
- β14Dec 22, 2024Updated last year
- [CVPR 2025] Enhanced OoD Detection through Cross-Modal Alignment of Multi-modal Representationsβ32Jun 27, 2025Updated 10 months ago
- langchain opentutorial utility package for Open Tutorialβ10Feb 2, 2025Updated last year
- 1-Click AI Models by DigitalOcean Gradient β’ AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- Kor-IR: Korean Information Retrieval Benchmarkβ87Jul 3, 2024Updated last year
- β109Oct 13, 2025Updated 7 months ago
- These are papers that I read and reviewed related to NLP, CV, and Deep Learning π You can check paper links and my reviews πβ13Jan 3, 2024Updated 2 years ago
- Dataset Resplitting for Generalization in KGQA. See also https://github.com/semantic-systems/KGQA-datasetsβ17Jun 29, 2022Updated 3 years ago
- Making the transition from Scratch to Pythonβ11Apr 11, 2017Updated 9 years ago
- (ACL2025 Findings) Official code for the paper "STeCa: Step-level Trajectory Calibration for LLM Agent Learning"β27Mar 2, 2026Updated 2 months ago
- β12Aug 17, 2023Updated 2 years ago
- "μ λμ€ λ¦¬λ μ€ μ Έ μ€ν¬λ¦½νΈ μμ μ¬μ : Unix & Linux Shell Script Exercise Dictionary" - νλΉλ―Έλμ΄β10Jan 17, 2017Updated 9 years ago
- [Findings of NAACL2022] A Dog Is Passing Over The Jet? A Text-Generation Dataset for Korean Commonsense Reasoning and Evaluationβ11May 27, 2022Updated 3 years ago
- Proton VPN Special Offer - Get 70% off β’ AdSpecial partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
- κ°λ²Όμ΄ λ©ν°μμ΄μ νΈ μ€μΌμ€νΈλ μ΄μ μ νꡬνλ κ΅μ‘ νλ μμν¬μ λλ€. OpenAI μ루μ νμμ κ΄λ¦¬ν©λλ€.β16Oct 20, 2024Updated last year
- β83May 8, 2024Updated 2 years ago
- λͺ¨λμ AI μΌμΈμ Agentλ‘ μμ±νλ RAG κ°μ λ ν¬μ§ν 리μ λλ€.β19Dec 16, 2025Updated 5 months ago
- β64Jul 21, 2025Updated 10 months ago
- π¬A curated list of incredible amount of publications related to Dialogue Systems especially Chatbots and Chit-chat Systemsβ10Dec 5, 2019Updated 6 years ago
- μΈμ’ ꡬ문 λΆμ λ§λμΉμ μμ‘΄ ꡬ문 ꡬ쑰λ‘μ λ³ν λꡬβ10Sep 7, 2018Updated 7 years ago
- A toolkit to automatically crawl the paper list and download paper pdfs of ACL Ahthology.β11Nov 12, 2025Updated 6 months ago
- my-claude-code-assetβ122Apr 11, 2026Updated last month
- A collection of Python agent samples built with the Google Agent Development Kit (ADK), demonstrating integrations with services like Bβ¦β21May 8, 2026Updated last week
- 1-Click AI Models by DigitalOcean Gradient β’ AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- AutoRAG example about benchmarking Korean embeddings.β44Oct 2, 2024Updated last year
- β19Sep 3, 2024Updated last year
- KoRean based ELECTRA pre-trained models (KR-ELECTRA) for Tensorflow and PyTorchβ15Feb 13, 2022Updated 4 years ago
- From packpub bookβ15Mar 9, 2016Updated 10 years ago
- [NAACL 2025] The official implementation of paper "Learning From Failure: Integrating Negative Examples when Fine-tuning Large Language Mβ¦β28Mar 14, 2024Updated 2 years ago
- Huggies is a plug and play automation tool for AWS Elastic Beanstalkβ13Nov 8, 2017Updated 8 years ago
- 2019 κ΅μ΄κ²½μ§λν νκ΅μ΄ μ쑴ꡬ문 λΆμ λμ(λ¬Έμ²΄λΆ μ₯κ΄μ)β15Oct 26, 2022Updated 3 years ago
- π¦ νμ΄μ¬ νκΈ μ²λ¦¬ λΌμ΄λΈλ¬λ¦¬. Python Korean Morphological Analyzerβ19Feb 4, 2025Updated last year
- λ무μν€, μν€νΌλμ, λ€μλΈλ‘κ·Έ, ν°μ€ν 리, μ νλΈ, λ€μ΄νΈν ν¬λ‘€λ¬β13Feb 20, 2026Updated 3 months ago
- Deploy on Railway without the complexity - Free Credits Offer β’ AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- Korean Training Data Set Generator for Google Syntanxnetβ13Jun 27, 2017Updated 8 years ago
- β68Dec 29, 2025Updated 4 months ago
- bb25 is a fast, self-contained BM25 + Bayesian calibration implementation with a minimal Python API.β147Mar 17, 2026Updated 2 months ago
- Official Code Repository for Knowledge-Augmented Language Model Verification (EMNLP 2023)β28Dec 22, 2023Updated 2 years ago
- NLP μμ¬λΆν° μλΉκΉμ§ ν κΆμ μ± μμ λ€λ£Ήλλ€.β25Dec 6, 2025Updated 5 months ago
- β35Mar 22, 2026Updated last month
- Make running benchmark simple yet maintainable, again. Now only supports Korean-based cross-encoder.β33Dec 2, 2025Updated 5 months ago