segment-any-text / wtpsplit
Toolkit to segment text into sentences or other semantic units in a robust, efficient and adaptable way.
β733Updated last week
Related projects β
Alternatives and complementary repositories for wtpsplit
- ππ―pySBD (Python Sentence Boundary Disambiguation) is a rule-based sentence boundary detection that works out-of-the-box.β808Updated 3 months ago
- Punctuation restoration and spell correction experiments.β248Updated 3 years ago
- A sentence segmenter that actually works!β302Updated 4 years ago
- NeuSpell: A Neural Spelling Correction Toolkitβ671Updated last year
- β‘ boost inference speed of T5 models by 5x & reduce the model size by 3x.β565Updated last year
- βοΈContextual word checker for better suggestions (not actively maintained)β409Updated last month
- π¦ Integrating LLMs into structured NLP pipelinesβ1,136Updated 3 months ago
- SpanMarker for Named Entity Recognitionβ401Updated 3 months ago
- β347Updated 8 months ago
- Fast lexical search implementing BM25 in Python using Numpy, Numba and Scipyβ908Updated last week
- Efficient few-shot learning with Sentence Transformersβ2,239Updated 2 months ago
- Tools to download and cleanup Common Crawl dataβ971Updated last year
- A python module for English lemmatization and inflection.β261Updated last year
- A Heterogeneous Benchmark for Information Retrieval. Easy to use, evaluate your models across 15+ diverse IR datasets.β1,622Updated 3 months ago
- Punctuation Restoration using Transformer Models for High-and Low-Resource Languagesβ204Updated 3 months ago
- Evaluate your speech-to-text system with similarity measures such as word error rate (WER)β639Updated 2 weeks ago
- The most accurate natural language detection library for Python, suitable for short text and mixed-language textβ1,160Updated 3 weeks ago
- Text tokenization and sentence segmentation (segtok v2)β203Updated 2 years ago
- A neural word aligner based on multilingual BERTβ328Updated 2 years ago
- A collection of large question answering datasetsβ337Updated 4 months ago
- Text to sentence splitter using heuristic algorithm by Philipp Koehn and Josh Schroeder.β230Updated 2 years ago
- SGPT: GPT Sentence Embeddings for Semantic Searchβ852Updated 9 months ago
- β1,124Updated 3 months ago
- β487Updated 9 months ago
- A Python library for calculating a large variety of metrics from textβ315Updated last month
- β147Updated 5 months ago
- Autoregressive Entity Retrievalβ765Updated last year
- FastFit β‘ When LLMs are Unfit Use FastFit β‘ Fast and Effective Text Classification with Many Classesβ183Updated last month
- Multilingual sentence alignment using sentence embeddingsβ101Updated 2 weeks ago
- β461Updated 4 months ago