A collection of various NLP datasets, mainly Indonesia-related languages.
☆15Apr 23, 2022Updated 3 years ago
Alternatives and similar repositories for nlp-datasets
Users that are interested in nlp-datasets are comparing it to the libraries listed below
Sorting:
- ☆11Aug 26, 2021Updated 4 years ago
- ☯️ AllenNLP training configurations for promising models on Named Entity Recognition. (BiLSTM-CRF, BiLSTM-CNN-CRF, BERT, BERT-CRF)☆15Nov 26, 2020Updated 5 years ago
- DefSent: Sentence Embeddings using Definition Sentences☆22Aug 5, 2021Updated 4 years ago
- A Japanese dependency parser based on BERT☆23Oct 26, 2022Updated 3 years ago
- DMV/CCM implementation☆17Jul 14, 2016Updated 9 years ago
- ☆19May 23, 2024Updated last year
- An annotation tool for grounding of formulae☆24May 28, 2024Updated last year
- benchmarks for LLM tokenizers☆17Feb 27, 2026Updated last week
- NLP Datasets for Indonesian☆126Feb 11, 2023Updated 3 years ago
- This repository is about how to build an SQLite version of the Arabic WordNet database.☆10Mar 19, 2019Updated 6 years ago
- Collection of links to blogs/ resources on various ML topics☆13Jun 15, 2022Updated 3 years ago
- OpenNMT Colab Tutorial Pytorch && Tensorflow☆31Nov 18, 2019Updated 6 years ago
- A curated list of research papers and resources on Indonesian languages☆40Mar 21, 2024Updated last year
- A set of base classes in order to perfom training scripts for Neural Networs ( by means of SNNS) and SVM ( by means of SVM Light and SVM …☆14Jun 24, 2011Updated 14 years ago
- This is implementation examples by Chainer.☆11Apr 7, 2018Updated 7 years ago
- [AAAI'23] FinalMLP: An Enhanced Two-Stream MLP Model for CTR Prediction https://arxiv.org/abs/2304.00902☆10Apr 9, 2023Updated 2 years ago
- No Gabut Challenge Submission☆10Mar 2, 2021Updated 5 years ago
- RWKV is a RNN with transformer-level LLM performance. It can be directly trained like a GPT (parallelizable). So it's combining the best …☆10Nov 3, 2023Updated 2 years ago
- Node.js wrapper for the GLTF2Loader library from Three.js☆10Nov 8, 2017Updated 8 years ago
- Performs tasks together with GPT.☆13Apr 4, 2023Updated 2 years ago
- 🍺 a Homebrew keg that specialized in Natural Language Processing.☆22May 23, 2018Updated 7 years ago
- ☆10Jun 4, 2020Updated 5 years ago
- A tool to collect/validate audio recordings from workers on Amazon Mechanical Turk. Written in Python/Flask. (originally hosted on github…☆14Dec 19, 2022Updated 3 years ago
- MG top-down beam parsing☆13Jul 2, 2018Updated 7 years ago
- Named Entity (NER) annotations of the Hebrew Treebank (Haaretz newspaper) corpus, including: morpheme and token level NER labels, nested …☆10Dec 27, 2021Updated 4 years ago
- A collection of English tweets annotated in Universal Dependencies.☆39Oct 20, 2021Updated 4 years ago
- Script sederhana untuk mengubah aksara latin menjadi aksara Jawa☆35May 2, 2023Updated 2 years ago
- Japanese BERT trained on Aozora Bunko and Wikipedia, pre-tokenized by MeCab with UniDic & SudachiPy☆40Aug 8, 2020Updated 5 years ago
- Phonetically balanced text to speech sentences☆10Aug 16, 2021Updated 4 years ago
- Supplementary materials for "Evaluating generalised additive mixed modelling strategies for dynamic speech analysis"☆10Jan 25, 2021Updated 5 years ago
- Concise, powerful asynchronous flow control library for JavaScript☆84Jun 29, 2017Updated 8 years ago
- Unsupervised Grammar Induction with Combinatory Categorial Grammars☆10Jan 28, 2021Updated 5 years ago
- A language server implementation for pysen☆10Nov 14, 2021Updated 4 years ago
- A corpus of diacritized Hebrew texts (טקסט מנוקד)☆11May 4, 2022Updated 3 years ago
- Behavioral probing of language acquisition models at the lexical and syntactic level☆17Jul 17, 2023Updated 2 years ago
- Al-Qur'an yang dikemas dalam bentuk ChatBot☆15Dec 1, 2020Updated 5 years ago
- ☆10Dec 11, 2016Updated 9 years ago
- Natural Language Inflection in English☆11Jan 10, 2022Updated 4 years ago
- Mirror of GlottHMM☆10Jun 7, 2016Updated 9 years ago