hplt-project / sacremoses
Python port of Moses tokenizer, truecaser and normalizer
☆488Updated 5 months ago
Related projects ⓘ
Alternatives and complementary repositories for sacremoses
- A tool for holistic analysis of language generations systems☆467Updated 2 years ago
- Fast BPE☆656Updated 5 months ago
- Simple, fast unsupervised word aligner☆738Updated 2 years ago
- Obtain Word Alignments using Pretrained Language Models (e.g., mBERT)☆351Updated last year
- Unsupervised Statistical Machine Translation☆228Updated 4 years ago
- A neural word aligner based on multilingual BERT☆328Updated 2 years ago
- A framework to learn cross-lingual word embedding mappings☆645Updated last year
- Open-Source Machine Translation Quality Estimation in PyTorch☆228Updated 2 years ago
- Reference BLEU implementation that auto-downloads test sets and reports a version string to facilitate cross-lab comparisons☆1,068Updated 3 months ago
- LASER multilingual sentence embeddings as a pip package☆225Updated last year
- A CoNLL-U parser that takes a CoNLL-U formatted string and turns it into a nested python dictionary.☆312Updated last month
- Evaluating Cross-lingual Sentence Representations☆442Updated 3 years ago
- Pre-trained subword embeddings in 275 languages, based on Byte-Pair Encoding (BPE)☆1,187Updated last month
- ☆360Updated last year
- This dataset contains 108,463 human-labeled and 656k noisily labeled pairs that feature the importance of modeling structure, context, an…☆555Updated 2 years ago
- High-accuracy NLP parser with models for 11 languages.☆871Updated 2 years ago
- Open-Source Neural Machine Translation in Tensorflow☆797Updated last year
- A single model that parses Universal Dependencies across 75 languages. Given a sentence, jointly predicts part-of-speech tags, morphology…☆220Updated last year
- BERT for Coreference Resolution☆445Updated last year
- Byte Pair Encoding for Python!☆223Updated 2 years ago
- ERRor ANnotation Toolkit: Automatically extract and classify grammatical errors in parallel original and corrected sentences.☆440Updated 7 months ago
- End-to-end Neural Coreference Resolution☆524Updated 2 years ago
- Bicleaner is a parallel corpus classifier/cleaner that aims at detecting noisy sentence pairs in a parallel corpus.☆150Updated 5 months ago
- Builds wordpiece(subword) vocabulary compatible for Google Research's BERT☆226Updated 3 years ago
- 💥 Use the latest Stanza (StanfordNLP) research models directly in spaCy☆725Updated 3 months ago
- Easier Automatic Sentence Simplification Evaluation☆159Updated last year
- Text tokenization and sentence segmentation (segtok v2)☆203Updated 2 years ago
- Fast + Non-Autoregressive Grammatical Error Correction using BERT. Code and Pre-trained models for paper "Parallel Iterative Edit Models …☆228Updated last year
- Python library & examples for Masked Language Model Scoring (ACL 2020)☆336Updated last year