mediacloud / sentence-splitter
Text to sentence splitter using heuristic algorithm by Philipp Koehn and Josh Schroeder.
☆245Updated 2 years ago
Alternatives and similar repositories for sentence-splitter:
Users that are interested in sentence-splitter are comparing it to the libraries listed below
- Text tokenization and sentence segmentation (segtok v2)☆202Updated 3 years ago
- OpusFilter - Parallel corpus processing toolkit☆104Updated last month
- Improved Sentence Alignment in Linear Time and Space☆170Updated 2 years ago
- Bicleaner is a parallel corpus classifier/cleaner that aims at detecting noisy sentence pairs in a parallel corpus.☆157Updated 10 months ago
- Efficient Low-Memory Aligner☆143Updated 3 months ago
- Obtain Word Alignments using Pretrained Language Models (e.g., mBERT)☆360Updated last year
- Sentence aligner☆112Updated 3 years ago
- LASER multilingual sentence embeddings as a pip package☆223Updated last year
- Bilingual term extractor☆53Updated last year
- Multilingual sentence alignment using sentence embeddings☆116Updated 6 months ago
- ✔️Contextual word checker for better suggestions (not actively maintained)☆413Updated 3 months ago
- Language independent truecaser in Python.☆160Updated 3 years ago
- ☆169Updated last month
- A tokenizer and sentence splitter for German and English web and social media texts.☆142Updated 4 months ago
- ☆47Updated 9 months ago
- Bitextor generates translation memories from multilingual websites☆292Updated 5 months ago
- A tool that locates, downloads, and extracts machine translation corpora☆154Updated last week
- A neural word aligner based on multilingual BERT☆346Updated 3 years ago
- Simple multilingual lemmatizer for Python, especially useful for speed and efficiency☆154Updated 5 months ago
- A python module for word inflections designed for use with spaCy.☆92Updated 5 years ago
- A python true casing utility that restores case information for texts☆88Updated 2 years ago
- coFR: COreference resolution tool for FRench (and singletons).☆24Updated 4 years ago
- Google USE (Universal Sentence Encoder) for spaCy☆184Updated 2 years ago
- Python3 bindings for the Compact Language Detector v3 (CLD3)☆151Updated last year
- Easier Automatic Sentence Simplification Evaluation☆160Updated last year
- ☆72Updated last month
- This packages up data for the Open Multilingual Wordnet☆48Updated 2 weeks ago
- A sentence segmenter that actually works!☆306Updated 4 years ago
- A python module for English lemmatization and inflection.☆268Updated last year
- A single model that parses Universal Dependencies across 75 languages. Given a sentence, jointly predicts part-of-speech tags, morphology…☆223Updated 2 years ago