Helsinki-NLP / subalignLinks
☆16Updated 2 years ago
Alternatives and similar repositories for subalign
Users that are interested in subalign are comparing it to the libraries listed below
Sorting:
- Multilingual sentence alignment using sentence embeddings☆128Updated 11 months ago
- ☆31Updated last year
- Improved Sentence Alignment in Linear Time and Space☆184Updated 2 years ago
- Punctuation Restoration using Transformer Models for High-and Low-Resource Languages☆223Updated last year
- OpusFilter - Parallel corpus processing toolkit☆110Updated last month
- Sentence aligner☆118Updated 4 years ago
- Translation demonstrator☆34Updated 5 years ago
- ☆42Updated 7 years ago
- Bicleaner is a parallel corpus classifier/cleaner that aims at detecting noisy sentence pairs in a parallel corpus.☆160Updated last year
- ☆49Updated last year
- ☆78Updated 2 months ago
- Complimentary code for our paper Automatic punctuation restoration with BERT models☆50Updated last year
- NTREX -- News Test References for MT Evaluation☆85Updated last year
- Tool to fix bitexts and tag near-duplicates for removal☆33Updated last month
- ☆12Updated 9 years ago
- 📝An easy-to-use package to restore punctuation of the text.☆119Updated 2 years ago
- Universal Romanizer that can convert any unicode script to roman (latin) script☆226Updated last year
- Efficient Low-Memory Aligner☆146Updated 9 months ago
- SpanAlign: Sentence Alignment Method based on Cross-Language Span Prediction and ILP☆14Updated 4 years ago
- Improving Low-Resource Neural Machine Translation of Related Languages by Transfer Learning☆19Updated 3 years ago
- Training an n-gram based Language Model using KenLM toolkit for Deep Speech 2☆114Updated 6 years ago
- A toolkit for producing n-gram language models. The highlights are the implementation of Kneser-Ney growing and revised Kneser pruning me…☆40Updated last month
- Bicleaner fork that uses neural networks☆39Updated 4 months ago
- 📈 A forced aligner intended for synchronization of narrated text☆100Updated 2 months ago
- This repo contains a set of neural transducer, e.g. sequence-to-sequence model, focusing on character-level tasks.☆76Updated 2 years ago
- Code for extracting parallel corpora from pmindia☆16Updated 5 years ago
- SIGMORPHON 2022 Shared Task on Morpheme Segmentation☆28Updated 2 years ago
- A python package for deep multilingual punctuation prediction.☆135Updated last year
- Punctuation restoration and spell correction experiments.☆252Updated 4 years ago
- A model that predicts the punctuation of English, Italian, French and German texts.☆81Updated 2 years ago