EleutherAI/polyglot-data

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/EleutherAI/polyglot-data)

EleutherAI / polyglot-data

data related codebase for polyglot project

☆19

Alternatives and similar repositories for polyglot-data

Users that are interested in polyglot-data are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

jason9693 / oslo-kogpt-finetunig
View on GitHub
kogpt를 oslo로 파인튜닝하는 예제.
☆23Aug 26, 2022Updated 3 years ago
Beomi / easy-lm-trainer
View on GitHub
🤗 최소한의 세팅으로 LM을 학습하기 위한 샘플코드
☆59May 23, 2023Updated 3 years ago
jason9693 / ETA4LLMs
View on GitHub
Calculating Expected Time for training LLM.
☆39Apr 17, 2023Updated 3 years ago
AIRC-KETI / Korean-Copora
View on GitHub
☆14Dec 9, 2021Updated 4 years ago
nlpai-lab / Korean-CommonGen
View on GitHub
[Findings of NAACL2022] A Dog Is Passing Over The Jet? A Text-Generation Dataset for Korean Commonsense Reasoning and Evaluation
☆11May 27, 2022Updated 4 years ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
EleutherAI / polyglot
View on GitHub
Polyglot: Large Language Models of Well-balanced Competence in Multi-languages
☆487Aug 22, 2023Updated 2 years ago
hyunwoongko / beyond-lm
View on GitHub
Beyond LM: How can language model go forward in the future?
☆15Apr 30, 2023Updated 3 years ago
wisenut-research / KoT5
View on GitHub
한국어 T5 모델
☆56Dec 7, 2021Updated 4 years ago
EleutherAI / oslo
View on GitHub
OSLO: Open Source for Large-scale Optimization
☆175Sep 9, 2023Updated 2 years ago
tunib-ai / artwork_captions
View on GitHub
Machine Generated Captions for Best Artworks
☆22Sep 21, 2022Updated 3 years ago
MrBananaHuman / open-korean-instructions
View on GitHub
언어모델을 학습하기 위한 공개 한국어 instruction dataset들을 모아두었습니다.
☆19Jul 16, 2023Updated 3 years ago
simonjisu / annotated-transformer-kr
View on GitHub
annotated-transformer-kr
☆15May 16, 2019Updated 7 years ago
hyunwoongko / python-mecab-kor
View on GitHub
Yet another python binding for mecab-ko
☆88May 16, 2023Updated 3 years ago
EleutherAI / dps
View on GitHub
Data processing system for polyglot
☆93Jul 6, 2026Updated 2 weeks ago
End-to-end encrypted cloud storage - Proton Drive • Ad
Special offer: 40% Off Yearly / 80% Off First Month. Protect your most important files, photos, and documents from prying eyes.
nayohan / SimKoR
View on GitHub
[HCLT 2022] Korean sentence text similarity dataset using naver shopping review
☆25Oct 20, 2022Updated 3 years ago
hyunwoongko / pydatrie
View on GitHub
Pure python implementation of DARTS (Double ARray Trie System)
☆24Dec 7, 2022Updated 3 years ago
korean-named-entity / konec
View on GitHub
Korean Named Entity Corpus
☆25May 12, 2023Updated 3 years ago
tienthanhdhcn / VnAPE
View on GitHub
Automatic Post-Editing for Vietnamese
☆13Nov 8, 2021Updated 4 years ago
GunwooHan / nunbody_segmentation
View on GitHub
Alchera AI Competition 2nd Solution (body part segmentation)
☆23Dec 7, 2021Updated 4 years ago
LG-NLP / KorWikiTableQuestions
View on GitHub
This repo is for Korean wiki table question answering datasets described in the paper of Korean-Specific Dataset for Table Question Answe…
☆91Oct 22, 2024Updated last year
jooinjang / Ko-ATOMIC
View on GitHub
Korean Commonsense Knowledge Graph
☆15Dec 23, 2022Updated 3 years ago
human-rights-corpus / HRC
View on GitHub
#인권코퍼스
☆31Oct 6, 2023Updated 2 years ago
lovit / levenshtein_finder
View on GitHub
Similar string search in Levenshtein distance
☆21Jun 19, 2021Updated 5 years ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
baikalai / baikal-bert
View on GitHub
baikal.ai's pre-trained BERT models: descriptions and sample codes
☆12Jun 24, 2021Updated 5 years ago
taeminlee / KoGPT2-Transformers
View on GitHub
KoGPT2 on Huggingface Transformers
☆33May 4, 2021Updated 5 years ago
bvanaken / clinical-assertion-data
View on GitHub
Dataset for the NLPMC @ NAACL 2021 Paper: Assertion Detection in Clinical Notes: Medical Language Models to the Rescue?
☆16Sep 28, 2021Updated 4 years ago
kyunghoon-jung / RL_implementation
View on GitHub
RL Implementation
☆19May 10, 2022Updated 4 years ago
hyunwoongko / dialobot
View on GitHub
Opensource chatbot framework
☆16Aug 1, 2021Updated 4 years ago
ko-nlp / moducorpus-sanitizer
View on GitHub
모두의 말뭉치 데이터를 분석에 편리한 형태로 변환하는 기능을 제공합니다.
☆11Mar 2, 2022Updated 4 years ago
seopbo / py-automate
View on GitHub
업무자동화를 위한 Python 강의를 듣고 정리한 자료
☆13Oct 10, 2017Updated 8 years ago
hunsii / LawBot
View on GitHub
LLM을 활용한 대화형 유사 판례 검색 시스템입니다.
☆27Jul 3, 2023Updated 3 years ago
yeonsw / RankEncoder
View on GitHub
☆35May 18, 2023Updated 3 years ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
jwkanggist / SSL-narratives-NLP-1
View on GitHub
거꾸로 읽는 self-supervised learning in NLP
☆27Oct 30, 2022Updated 3 years ago
boostcampaitech2 / final-project-level3-nlp-02
View on GitHub
final-project-level3-nlp-02 created by GitHub Classroom
☆11Dec 31, 2021Updated 4 years ago
EleutherAI / hae-rae
View on GitHub
☆33Aug 30, 2023Updated 2 years ago
J-Seo / Korean-CommonGen
View on GitHub
[Findings of NAACL2022] A Dog Is Passing Over The Jet? A Text-Generation Dataset for Korean Commonsense Reasoning and Evaluation
☆28Dec 9, 2022Updated 3 years ago
kakao / OrchestrationBench
View on GitHub
☆48Apr 17, 2026Updated 3 months ago
tunib-ai / large-scale-lm-tutorials
View on GitHub
Large-scale language modeling tutorials with PyTorch
☆293Nov 2, 2021Updated 4 years ago
irfananda00 / Crawler-using-Scrapy
View on GitHub
Crawling some e-commerce site in Indonesia (blibli, bukalapak, lazada, mataharimall, and tokopedia) using python scrapy and save the craw…
☆10Jan 28, 2017Updated 9 years ago