KoichiYasuoka / esuparLinks
Tokenizer POS-Tagger and Dependency-parser with BERT/RoBERTa/DeBERTa/GPT models for Japanese and other languages
☆52Updated 2 months ago
Alternatives and similar repositories for esupar
Users that are interested in esupar are comparing it to the libraries listed below
Sorting:
- GrammarTagger — A Neural Multilingual Grammar Profiler for Language Learning☆31Updated 4 years ago
- Tokenizer POS-tagger and Dependency-parser for Classical Chinese☆14Updated 4 months ago
- OpusFilter - Parallel corpus processing toolkit☆112Updated last week
- Code for paper "Kanbun-LM: Reading and Translating Classical Chinese in Japanese Method by Language Models"☆18Updated 2 years ago
- TUFS Asian Language Parallel Corpus☆51Updated 2 years ago
- X-SCITLDR: Cross-Lingual Extreme Summarization of Scholarly Documents (JCDL 2022)☆14Updated 3 years ago
- Utility scripts for preprocessing Wikipedia texts for NLP☆78Updated last year
- The Business Scene Dialogue corpus☆70Updated 4 years ago
- Multilingual sentence alignment using sentence embeddings☆130Updated last year
- An example usage of JParaCrawl pre-trained Neural Machine Translation (NMT) models.☆104Updated 4 years ago
- cLang-8 is a dataset for grammatical error correction.☆110Updated 3 years ago
- Code and models used in "MUSS Multilingual Unsupervised Sentence Simplification by Mining Paraphrases".☆99Updated 2 years ago
- A accurate multilingual word aligner based on LaBSE☆24Updated 2 years ago
- allennlp-light is a port of AllenNLP's core modules and nn portions into a standalone package with minimum dependencies☆55Updated 3 years ago
- A large parallel corpus of English and Japanese☆86Updated 8 years ago
- Coursera Corpus Mining and Multistage Fine-Tuning for Improving Lectures Translation☆15Updated last year
- ICU based universal language tokenizer☆34Updated 3 years ago
- Code accompanying the submission "Structural Text Segmentation of Legal Documents" by Aumiller et al.☆98Updated 2 years ago
- ☆32Updated 2 years ago
- Japanese data from the Google UDT 2.0.☆38Updated last week
- Codes to pre-train Japanese T5 models☆40Updated 4 years ago
- MultiCite code and data. Models are available on Huggingface.☆32Updated 3 years ago
- Mr. TyDi is a multi-lingual benchmark dataset built on TyDi, covering eleven typologically diverse languages.☆79Updated 3 years ago
- STREUSLE: a corpus with comprehensive lexical semantic annotation (multiword expressions, supersenses)☆67Updated last week
- ☆57Updated 2 years ago
- ☆36Updated 3 years ago
- A tiny BERT for low-resource monolingual models☆31Updated last month
- ☆17Updated 2 years ago
- mSimCSE: Multilingual SimCSE☆34Updated 3 years ago
- Repository to collect and categorize Grammatical Error Correction papers.☆121Updated 3 months ago