mesolitica / malaysian-dataset
We gather Malaysian dataset! https://malaysian-dataset.readthedocs.io/
☆312Updated last week
Alternatives and similar repositories for malaysian-dataset:
Users that are interested in malaysian-dataset are comparing it to the libraries listed below
- Natural Language Toolkit for Malaysian language, https://malaya.readthedocs.io/☆483Updated last week
- [MalayMMLU] This is the first-ever Bahasa Melayu multitask benchmark designed to elevate the performance of Large Language Models (LLMs) …☆31Updated 3 months ago
- Speech Toolkit for Malaysian language, https://malaya-speech.readthedocs.io/☆249Updated last week
- A collection of NLP resources for Malay☆25Updated 6 years ago
- Official data on Malaysia's National Covid-19 Immunisation Programme (PICK). Powered by MySejahtera.☆494Updated last month
- Indonesian Language Models and its Usage☆157Updated last year
- TUFS Asian Language Parallel Corpus☆50Updated last year
- Scrapping MalaysianPayGap & Extracting data from the Instagram posts☆75Updated 2 years ago
- Gathers Tensorflow deep learning models for Bahasa Malaysia NLP problems☆28Updated 5 years ago
- A dataset for Indonesian Named Entity Recognizer☆30Updated 4 years ago
- This repository contains the Arabic sarcasm dataset (ArSarcasm)☆24Updated 4 years ago
- Sarjana is an open source desktop application which is used to assist in reading information materials, be it research papers or technica…☆22Updated 8 months ago
- Zero-shot Transfer Learning from English to Arabic☆29Updated 2 years ago
- Husein pet projects in here!☆49Updated last year
- ☆12Updated 3 years ago
- Pre-process arabic text (remove diacritics, punctuations and repeating characters)☆106Updated 7 years ago
- Welcome to our repository! This repository hosts the data on "IndoCollex: A Testbed for Morphological Transformation of Indonesian Word …☆20Updated 3 years ago
- Arabic edition of BERT pretrained language models☆128Updated 4 years ago
- A collaborative project to collect datasets in SEA languages, SEA regions, or SEA cultures.☆78Updated 2 months ago
- The first-ever vast natural language generation benchmark for Indonesian, Sundanese, and Javanese. We provide multiple downstream tasks, …☆71Updated 4 months ago
- Multilingual Neural Machine Translation using Transformers with Conditional Normalization.☆18Updated 2 years ago
- A list of Indonesian NLP resources.☆279Updated 3 years ago
- ANETAC: Arabic Named Entity Transliteration and Classification Dataset☆34Updated 5 years ago
- Arabic Dialect Identification on AOC data.☆24Updated 6 years ago
- The first large-scale summarization corpus for the Indonesian language. AACL 2020.☆35Updated 4 years ago
- This dataset contains synthetic training data for grammatical error correction. The corpus is generated by corrupting clean sentences fro…☆160Updated 6 months ago
- This is a diacritization model for Arabic language. This model was built/trained using the Tashkeela: the Arabic diacritization corpus on…☆42Updated last year
- Named Entity Recognition for Bahasa Indonesia☆55Updated 8 years ago
- ☆109Updated last year
- ☆43Updated 9 years ago