microsoft / factored-segmenter
Unsupervised factor-based text tokenizer for natural-language processing applications
☆17Updated 4 years ago
Alternatives and similar repositories for factored-segmenter:
Users that are interested in factored-segmenter are comparing it to the libraries listed below
- Machine is a natural language processing library for .NET that is focused on providing tools for processing resource-poor languages.☆28Updated this week
- Bicleaner fork that uses neural networks☆39Updated 5 months ago
- NTREX -- News Test References for MT Evaluation☆80Updated 7 months ago
- Microsoft Speech Language Translation (MSLT) Corpus☆20Updated 7 years ago
- A library for minimum Bayes risk (MBR) decoding☆32Updated this week
- OPUS-CAT is a collection of software which make it possible to OPUS-MT neural machine translation models in professional translation. OPU…☆74Updated 3 weeks ago
- Port of PragmaticSegmenter for sentence boundary detection☆33Updated 3 years ago
- MAMMOTH: MAssively Multilingual Modular Open Translation @ Helsinki☆22Updated last month
- Project OCELoT: an Open, Collaborative Evaluation Leaderboard of Translations☆20Updated 6 months ago
- SHAS: Approaching optimal Segmentation for End-to-End Speech Translation☆37Updated last year
- .NET researching application for rendering, recording, playback, analyzing and compression of audio data.☆10Updated 7 years ago
- C# Sequence to Sequence Learning with Attention using LSTM neural Networks☆26Updated 7 years ago
- Bilingual sentence similarity classifier using Tensorflow☆19Updated 5 years ago
- A High-Quality Multilingual Dataset for Structured Documentation Translation☆35Updated 6 months ago
- ☆31Updated 2 years ago
- Repository for Findings of EMNLP 2020 "Context-aware Stand-alone Neural Spelling Correction"☆18Updated 4 years ago
- A library for data streaming and augmentation☆20Updated 10 months ago
- Curriculum training☆16Updated this week
- This tool helps automatic generation of grammatically valid synthetic Code-mixed data by utilizing linguistic theories such as Equivalenc…☆53Updated 5 months ago
- ☆30Updated 7 months ago
- This code provides word level language identification tool for identifying language for individual words in Code-Mixed text. e.g. The tex…☆51Updated 4 years ago
- c++ mosestokenizer☆16Updated 10 months ago
- Morfessor EM+Prune☆10Updated 4 years ago
- ☆20Updated last year
- Caucasus languages focused multilingual and monolingual corpuses for Natural Language Processing(NLP)☆34Updated last month
- A Supervised Word Alignment Method based on Cross-Language Span Prediction using Multilingual BERT☆26Updated 3 years ago
- SpanAlign: Sentence Alignment Method based on Cross-Language Span Prediction and ILP☆13Updated 3 years ago
- Code and data for the IWSLT 2022 shared task on Formality Control for SLT☆21Updated last year
- An easy-to-use library to linguistically compare one sentence and its words to another, in the same language or a different one. For inst…☆22Updated 3 years ago
- This repo contains a set of neural transducer, e.g. sequence-to-sequence model, focusing on character-level tasks.☆72Updated last year