mediacloud / sentence-splitter
Text to sentence splitter using heuristic algorithm by Philipp Koehn and Josh Schroeder.
☆230Updated 2 years ago
Related projects ⓘ
Alternatives and complementary repositories for sentence-splitter
- Text tokenization and sentence segmentation (segtok v2)☆203Updated 2 years ago
- OpusFilter - Parallel corpus processing toolkit☆102Updated 3 months ago
- Bicleaner is a parallel corpus classifier/cleaner that aims at detecting noisy sentence pairs in a parallel corpus.☆150Updated 5 months ago
- LASER multilingual sentence embeddings as a pip package☆225Updated last year
- Obtain Word Alignments using Pretrained Language Models (e.g., mBERT)☆351Updated last year
- A neural word aligner based on multilingual BERT☆328Updated 2 years ago
- A python module for English lemmatization and inflection.☆261Updated last year
- Simple multilingual lemmatizer for Python, especially useful for speed and efficiency☆144Updated this week
- Improved Sentence Alignment in Linear Time and Space☆163Updated last year
- Multilingual sentence alignment using sentence embeddings☆101Updated 2 weeks ago
- Easier Automatic Sentence Simplification Evaluation☆159Updated last year
- Efficient Low-Memory Aligner☆139Updated 2 months ago
- DBMDZ BERT, DistilBERT, ELECTRA, GPT-2 and ConvBERT models☆155Updated last year
- A tokenizer and sentence splitter for German and English web and social media texts.☆135Updated 3 months ago
- Sentence aligner☆108Updated 3 years ago
- A single model that parses Universal Dependencies across 75 languages. Given a sentence, jointly predicts part-of-speech tags, morphology…☆220Updated last year
- ☆165Updated 5 months ago
- ✔️Contextual word checker for better suggestions (not actively maintained)☆409Updated last month
- Bitextor generates translation memories from multilingual websites☆291Updated last week
- LexRank algorithm for text summarization☆229Updated 7 months ago
- This dataset contains synthetic training data for grammatical error correction. The corpus is generated by corrupting clean sentences fro…☆157Updated last month
- Implementation of the ClausIE information extraction system for python+spacy☆220Updated 2 years ago
- A modern, interlingual wordnet interface for Python☆221Updated last week
- Transformer based translation quality estimation☆107Updated last year
- Language independent truecaser in Python.☆161Updated 3 years ago
- coFR: COreference resolution tool for FRench (and singletons).☆24Updated 4 years ago
- Multilingual Rapid Automatic Keyword Extraction (RAKE) for Python☆268Updated last year
- A minimal, pure Python library to interface with CoNLL-U format files.☆149Updated last year
- A neural parsing pipeline for segmentation, morphological tagging, dependency parsing and lemmatization with pre-trained models for more …☆112Updated 6 months ago
- A python module for word inflections designed for use with spaCy.☆92Updated 4 years ago