Esukhia / Corpora
repo for Tibetan corpora
☆21Updated last year
Alternatives and similar repositories for Corpora:
Users that are interested in Corpora are comparing it to the libraries listed below
- ☆17Updated 7 years ago
- ✒️ དག་བྱེད། Dakje, improving your spelling and readability☆11Updated 2 years ago
- 🦜 NLP for Tibetan, in Python.☆33Updated last year
- 🏷 བོད་ཏོག [pʰøtɔk̚] Tibetan word tokenizer in Python☆58Updated last month
- Linguistically analyzed Classical Tibetan texts☆25Updated 3 years ago
- 😎 Curated list of Tibetan NLP projects☆36Updated 4 years ago
- TIP-LAS: An open source toolkit for Tibetan word segmentation and part-of-speech tagging☆81Updated 2 years ago
- Dataset for TALLIP2019 paper "Ancient-Modern Chinese Translation with a New Large Training Dataset"☆22Updated 2 years ago
- A Multi-tasking and Multi-stage Chinese Minority Pre-Trained Language Model☆10Updated last year
- <u><a href="https://circse.github.io/LT4HALA/" style="color: white">Workshop on Language Technologies for Historical and Ancient Language…☆33Updated 7 months ago
- A grammatical error correction reading list maintained by Beijing Language and Culture University Natural Language Processing Group☆24Updated 4 years ago
- Source codes of Neural Quality Estimation with Multiple Hypotheses for Grammatical Error Correction☆43Updated 3 years ago
- Punctuation restoration in ASR text☆32Updated 5 years ago
- Tibetan Language Processing Library☆18Updated 6 years ago
- TVsub: DCU-Tencent Chinese-English Dialogue Corpus☆46Updated 6 years ago
- ☆42Updated 6 years ago
- We use phonetics as a feature to create a joint semantic-phonetic embedding and improve the neural machine translation between Chinese an…☆11Updated 3 years ago
- Code of zlyang's master dissertation for Chinese grammatical error correction.☆34Updated 5 years ago
- Code and data of the paper "MCTS: A Multi-Reference Chinese Text Simplification Dataset".☆29Updated 7 months ago
- Code for our paper "Mask-Align: Self-Supervised Neural Word Alignment" in ACL 2021☆60Updated 3 years ago
- [ACL'21] Data for "An In-depth Study on Internal Structure of Chinese Words".☆14Updated 3 years ago
- Efficient Markov Chain word alignment☆54Updated 3 years ago
- Code for the paper "Multi-Task Learning for Domain-General Spoken Disfluency Detection in Dialogue Systems" (Igor Shalyminov, Arash Eshgh…☆24Updated 2 years ago
- Hunspell files for Tibetan☆22Updated 9 years ago
- Improved version of GECToR☆60Updated last year
- An open-access corpus of conversational bilingual speech in Cantonese and English☆40Updated 2 years ago
- The repository for the paper: Rethinking Document-level Neural Machine Translation☆25Updated 2 years ago
- ☆20Updated 6 years ago
- The code for EMNLP2022 paper "Improved grammatical error correction by ranking elementary edits"☆19Updated 2 years ago
- Efficient Low-Memory Aligner☆140Updated this week