sharavsambuu / english-mongolian-nmt-dataset-augmentation
Generate a 1 million-sample warm-up dataset for neural machine translation from a 700 million-word Mongolian text corpus using the Google Translate service
☆17Updated 3 months ago
Related projects ⓘ
Alternatives and complementary repositories for english-mongolian-nmt-dataset-augmentation
- Cyrillic Mongolian text classification with tensorflow 2, and also some fine-tuning on TugsTugi's Mongolian BERT model and other NLP expe…☆32Updated last year
- Useful resources for Mongolian NLP☆172Updated last year
- Pre-trained Mongolian BERT models☆43Updated 3 years ago
- Mongolian speech recognition with PyTorch☆129Updated 3 years ago
- The Mongolian Wordnet (MonWN)☆17Updated 2 years ago
- Монгол үгийн алдаа шалгах толь, Mongolian spellchecking dictionary☆37Updated this week
- Pytorch-Named-Entity-Recognition-with-BERT☆15Updated 4 years ago
- Text to Speech with PyTorch (English and Mongolian)☆184Updated last month
- SIGMORPHON 2022 Shared Task on Morpheme Segmentation☆24Updated last year
- Lecture and seminar materials for Deep Learning summer school in Ulaanbaatar, 2021☆10Updated 3 years ago
- Text and Punctuation correction with Deep Learning☆129Updated 4 years ago
- Lecture and seminar materials for Deep Learning summer school in Ulaanbaatar, 2019☆12Updated 2 years ago
- Experimental project to punctuate text using a embedding layer, single convolutional layer and output softmax layer written in Keras.☆83Updated 4 years ago
- Arabic edition of BERT pretrained language models☆127Updated 3 years ago
- A GitHub action to run easily rasa train and rasa test in the CIs.☆34Updated last year
- Open source speech to text models for Indic Languages☆287Updated 2 years ago
- An NLP library for Uralic languages such as Finnish, Skolt Sami, Moksha and so on. Also supporting some non-Uralic languages such as Span…☆70Updated last week
- Deep Learning neural network for correcting spelling☆54Updated last year
- ALBERT trained on Mongolian text corpus☆18Updated 3 years ago
- MorphyNet: a Large Multilingual Database of Derivational and Inflectional Morphology (+morpheme segmentation)☆36Updated last year
- Collection of Urdu datasets for POS, NER, Sentiment, Summarization and NLP tasks.☆70Updated 3 months ago
- TUFS Asian Language Parallel Corpus☆48Updated last year
- An example usage of JParaCrawl pre-trained Neural Machine Translation (NMT) models.☆103Updated 3 years ago
- Jupyter Notebooks for creating Speech datasets☆46Updated 5 years ago
- SOTA punctation restoration (for e.g. automatic speech recognition) deep learning model based on BERT pre-trained model☆179Updated 5 years ago
- Crowd sourced training data for Rasa NLU models☆199Updated 10 months ago
- U-Money -ийн автобусны чиглэл харах API☆16Updated 3 years ago
- Improved Sentence Alignment in Linear Time and Space☆163Updated last year
- This is a repository of the Multi-dialect Arabic BERT model.☆38Updated 4 years ago
- Segment an audio file and obtain utterance alignments. (Python package)☆321Updated 5 months ago