uds-lsv / Noisy-Channel-Spell-Checker
A tool for correcting misspellings in textual input using the Noisy Channel Model.
☆11Updated 4 years ago
Alternatives and similar repositories for Noisy-Channel-Spell-Checker:
Users that are interested in Noisy-Channel-Spell-Checker are comparing it to the libraries listed below
- python package for calculating famous measures in computational linguistics☆13Updated 3 months ago
- Multilingual Open Text☆25Updated 3 months ago
- ☆17Updated last year
- Temporary remove unused tokens during training to save ram and speed.☆22Updated 7 months ago
- An easy-to-use library to linguistically compare one sentence and its words to another, in the same language or a different one. For inst…☆22Updated 3 years ago
- bin files☆13Updated 3 weeks ago
- several algorithms for converting dependency structures into constituency structures.☆10Updated 3 years ago
- GC4LM: A Colossal (Biased) language model for German☆13Updated 3 years ago
- A simple neural truecaser written in pytorch and allennlp.☆33Updated 8 months ago
- zero-vocab or low-vocab embeddings☆18Updated 2 years ago
- Tokenization across languages. Useful as preprocessing for subword tokenization.☆22Updated 2 years ago
- BERT models for many languages created from Wikipedia texts☆33Updated 4 years ago
- These are lists for a variety of languages containing words that are distinctive to each language.☆35Updated 2 years ago
- Code and models for our CLEF-HIPE (Named Entity Processing on Historical Newspapers) submissions☆19Updated last year
- A toolkit for producing n-gram language models. The highlights are the implementation of Kneser-Ney growing and revised Kneser pruning me…☆40Updated 5 months ago
- Tool for parsing and converting various span encoding schemes.☆22Updated last year
- Prabhupadavani: A Code-mixed Speech Translation Data for 25 languages☆13Updated 2 years ago
- BERT and ELECTRA models trained on Europeana Newspapers☆37Updated 3 years ago
- Unicode Standard tokenization routines and orthography profile segmentation☆35Updated this week
- Minimal code to train ELMo models in recent versions of TensorFlow☆14Updated last year
- A small repository to test Captum Explainable AI with a trained Flair transformers-based text classifier.☆26Updated 3 years ago
- SMOR (Stuttgart Morphology) with alternative lemmatization component☆12Updated last year
- Parallelized automatic corpus collection for ASR. Forked from https://github.com/EgorLakomkin/KTSpeechCrawler☆23Updated 3 years ago
- Learning BPE embeddings by first learning a segmentation model and then training word2vec☆19Updated 2 years ago
- Tool to fix bitexts and tag near-duplicates for removal☆29Updated 2 weeks ago
- Coursera Corpus Mining and Multistage Fine-Tuning for Improving Lectures Translation☆14Updated 5 months ago
- A accurate multilingual word aligner based on LaBSE☆20Updated last year
- Dataiku DSS plugin to detect languages, correct misspellings, and clean text data 🧼☆22Updated last month
- Spell checker using Brill and Moore's noisy channel error model☆11Updated 6 years ago
- UniParse: A universal graph-based parsing toolkit☆10Updated 5 years ago