hplt-project / sacremoses
Python port of Moses tokenizer, truecaser and normalizer
☆490Updated 7 months ago
Alternatives and similar repositories for sacremoses:
Users that are interested in sacremoses are comparing it to the libraries listed below
- A tool for holistic analysis of language generations systems☆467Updated 2 years ago
- Simple, fast unsupervised word aligner☆742Updated 2 years ago
- Fast BPE☆659Updated 7 months ago
- Obtain Word Alignments using Pretrained Language Models (e.g., mBERT)☆353Updated last year
- A neural word aligner based on multilingual BERT☆336Updated 2 years ago
- A framework to learn cross-lingual word embedding mappings☆648Updated last year
- ☆360Updated 2 years ago
- ERRor ANnotation Toolkit: Automatically extract and classify grammatical errors in parallel original and corrected sentences.☆440Updated 9 months ago
- Evaluating Cross-lingual Sentence Representations☆448Updated 3 years ago
- XTREME is a benchmark for the evaluation of the cross-lingual generalization ability of pre-trained multilingual models that covers 40 ty…☆636Updated 2 years ago
- Unsupervised Statistical Machine Translation☆229Updated 4 years ago
- Builds wordpiece(subword) vocabulary compatible for Google Research's BERT☆227Updated 4 years ago
- A CoNLL-U parser that takes a CoNLL-U formatted string and turns it into a nested python dictionary.☆311Updated this week
- Bicleaner is a parallel corpus classifier/cleaner that aims at detecting noisy sentence pairs in a parallel corpus.☆152Updated 7 months ago
- Open-Source Neural Machine Translation in Tensorflow☆797Updated 2 years ago
- Open-Source Machine Translation Quality Estimation in PyTorch☆228Updated 2 years ago
- Pre-trained subword embeddings in 275 languages, based on Byte-Pair Encoding (BPE)☆1,192Updated 3 months ago
- Bitextor generates translation memories from multilingual websites☆293Updated 2 months ago
- Reference BLEU implementation that auto-downloads test sets and reports a version string to facilitate cross-lab comparisons☆1,091Updated last week
- This dataset contains 108,463 human-labeled and 656k noisily labeled pairs that feature the importance of modeling structure, context, an…☆555Updated 3 years ago
- ☆460Updated 3 years ago
- scripts and configuration files for Edinburgh neural MT submission to WMT 16 shared translation task☆139Updated 4 years ago
- An efficient implementation of the popular sequence models for text generation, summarization, and translation tasks. https://arxiv.org/p…☆429Updated 2 years ago
- LASER multilingual sentence embeddings as a pip package☆224Updated last year
- Topic-Aware Convolutional Neural Networks for Extreme Summarization☆357Updated last year
- Resources for the NAACL 2018 paper "A Discourse-Aware Attention Model for Abstractive Summarization of Long Documents"☆364Updated last year
- TyDi QA contains 200k human-annotated question-answer pairs in 11 Typologically Diverse languages, written without seeing the answer and …☆299Updated 4 years ago
- Easier Automatic Sentence Simplification Evaluation☆160Updated last year
- Fast + Non-Autoregressive Grammatical Error Correction using BERT. Code and Pre-trained models for paper "Parallel Iterative Edit Models …☆227Updated last year
- A Python wrapper for the ROUGE summarization evaluation package☆252Updated 3 years ago