mediacloud / sentence-splitter
Text to sentence splitter using heuristic algorithm by Philipp Koehn and Josh Schroeder.
☆234Updated 2 years ago
Alternatives and similar repositories for sentence-splitter:
Users that are interested in sentence-splitter are comparing it to the libraries listed below
- Text tokenization and sentence segmentation (segtok v2)☆203Updated 2 years ago
- Bicleaner is a parallel corpus classifier/cleaner that aims at detecting noisy sentence pairs in a parallel corpus.☆152Updated 7 months ago
- Improved Sentence Alignment in Linear Time and Space☆163Updated last year
- OpusFilter - Parallel corpus processing toolkit☆104Updated this week
- Simple multilingual lemmatizer for Python, especially useful for speed and efficiency☆151Updated last month
- Language independent truecaser in Python.☆161Updated 3 years ago
- Obtain Word Alignments using Pretrained Language Models (e.g., mBERT)☆353Updated last year
- Efficient Low-Memory Aligner☆140Updated this week
- Sentence aligner☆109Updated 3 years ago
- Bilingual term extractor☆52Updated last year
- Multilingual sentence alignment using sentence embeddings☆106Updated 2 months ago
- A neural word aligner based on multilingual BERT☆336Updated 2 years ago
- LASER multilingual sentence embeddings as a pip package☆224Updated last year
- Bitextor generates translation memories from multilingual websites☆293Updated 2 months ago
- ☆44Updated 5 months ago
- A python module for English lemmatization and inflection.☆265Updated last year
- spaCy + UDPipe☆161Updated 2 years ago
- A python true casing utility that restores case information for texts☆88Updated 2 years ago
- A modern, interlingual wordnet interface for Python☆229Updated last month
- A minimal, pure Python library to interface with CoNLL-U format files.☆149Updated last year
- spacy-wordnet creates annotations that easily allow the use of wordnet and wordnet domains by using the nltk wordnet interface☆253Updated 4 months ago
- A python module for word inflections designed for use with spaCy.☆92Updated 4 years ago
- ✔️Contextual word checker for better suggestions (not actively maintained)☆412Updated last month
- Morfessor is a tool for unsupervised and semi-supervised morphological segmentation☆187Updated 4 years ago
- Implementation of the ClausIE information extraction system for python+spacy☆220Updated 2 years ago
- Transformer based translation quality estimation☆107Updated last year
- Automatic extraction of edited sentences from text edition histories.☆82Updated 2 years ago
- A single model that parses Universal Dependencies across 75 languages. Given a sentence, jointly predicts part-of-speech tags, morphology…☆221Updated 2 years ago
- Translation Memory Open-source Purifier☆33Updated 2 years ago
- Machine-Translation-based sentence alignment tool for parallel text☆304Updated 3 years ago