pln-fing-udelar / jojajovaiLinks
Jojajovai Guarani-Spanish Parallel Corpus
☆16Updated 3 years ago
Alternatives and similar repositories for jojajovai
Users that are interested in jojajovai are comparing it to the libraries listed below
Sorting:
- ☆45Updated 3 years ago
- Bicleaner is a parallel corpus classifier/cleaner that aims at detecting noisy sentence pairs in a parallel corpus.☆160Updated last year
- TUFS Asian Language Parallel Corpus☆51Updated 2 years ago
- NTREX -- News Test References for MT Evaluation☆85Updated last year
- A tool that locates, downloads, and extracts machine translation corpora☆159Updated last month
- A neural word aligner based on multilingual BERT☆358Updated 3 years ago
- OpusFilter - Parallel corpus processing toolkit☆110Updated last month
- Easier Automatic Sentence Simplification Evaluation☆162Updated 2 years ago
- Transformer based translation quality estimation☆114Updated 2 years ago
- spaCy + UDPipe☆163Updated 3 years ago
- A french sequence to sequence pretrained model☆62Updated 3 years ago
- A minimal, pure Python library to interface with CoNLL-U format files.☆152Updated this week
- Obtain Word Alignments using Pretrained Language Models (e.g., mBERT)☆379Updated last year
- Efficient Low-Memory Aligner☆146Updated 9 months ago
- ☆49Updated last year
- Python-based implementation of the Translate-Align-Retrieve method to automatically translate the SQuAD Dataset to Spanish.☆59Updated 2 years ago
- AfroLID, a powerful neural toolkit for African languages identification which covers 517 African languages.☆32Updated 7 months ago
- a tool for calcualting character n-gram F score☆74Updated 2 years ago
- ☆26Updated last year
- Open-Source Machine Translation Quality Estimation in PyTorch☆231Updated 3 years ago
- Improved Sentence Alignment in Linear Time and Space☆184Updated 2 years ago
- Creating super-parallel corpora of more than 1500+ unique languages for NLP research☆34Updated 2 years ago
- MT Evaluation in Many Languages via Zero-Shot Paraphrasing☆102Updated last year
- This dataset contains synthetic training data for grammatical error correction. The corpus is generated by corrupting clean sentences fro…☆161Updated last year
- DBMDZ BERT, DistilBERT, ELECTRA, GPT-2 and ConvBERT models☆154Updated 2 years ago
- This is a simple Python package for calculating a variety of lexical diversity indices☆81Updated 2 years ago
- The FLORES+ Machine Translation Benchmark☆108Updated 11 months ago
- These are lists for a variety of languages containing words that are distinctive to each language.☆38Updated 3 years ago
- ☆64Updated 2 years ago
- LASER multilingual sentence embeddings as a pip package☆225Updated 2 years ago