ππ―pySBD (Python Sentence Boundary Disambiguation) is a rule-based sentence boundary detection that works out-of-the-box.
β904Aug 20, 2024Updated last year
Alternatives and similar repositories for pySBD
Users that are interested in pySBD are comparing it to the libraries listed below
Sorting:
- PYthon Automated Term Extractionβ318Feb 8, 2023Updated 3 years ago
- Text tokenization and sentence segmentation (segtok v2)β209Mar 12, 2022Updated 3 years ago
- spaCy pipeline object for negating concepts in textβ282Jun 16, 2025Updated 8 months ago
- Toolkit to segment text into sentences or other semantic units in a robust, efficient and adaptable way.β1,245Feb 26, 2026Updated last week
- skweak: A software toolkit for weak supervision applied to NLP tasksβ926Sep 2, 2024Updated last year
- Fuzzy matching and more functionality for spaCy.β259Jul 6, 2024Updated last year
- Implementation of the ClausIE information extraction system for python+spacyβ226Aug 8, 2022Updated 3 years ago
- βοΈContextual word checker for better suggestions (not actively maintained)β418Jan 31, 2025Updated last year
- πΈ Use pretrained transformers like BERT, XLNet and GPT-2 in spaCyβ1,402Nov 7, 2025Updated 3 months ago
- NLP, before and after spaCyβ2,235Sep 22, 2023Updated 2 years ago
- Fuzzy string matching, grouping, and evaluation.β791Jul 10, 2025Updated 7 months ago
- A spaCy pipeline and model for NLP on unstructured legal text.β674Jul 16, 2024Updated last year
- A full spaCy pipeline and models for scientific/biomedical documents.β1,926Dec 4, 2025Updated 3 months ago
- A python package to run contextualized topic modeling. CTMs combine contextualized embeddings (e.g., BERT) with topic models to get coherβ¦β1,265Jul 24, 2025Updated 7 months ago
- Active Learning for Text Classification in Pythonβ639Feb 1, 2026Updated last month
- Coreference resolution for English, French, German and Polish, optimised for limited training data and easily extensible for further langβ¦β199Dec 18, 2022Updated 3 years ago
- Python implementation of TextRank algorithms ("textgraphs") for phrase extractionβ2,209Feb 15, 2026Updated 2 weeks ago
- π§Ή Python package for text cleaningβ1,002Jan 28, 2026Updated last month
- π₯ Use the latest Stanza (StanfordNLP) research models directly in spaCyβ746Aug 15, 2024Updated last year
- A very simple framework for state-of-the-art Natural Language Processing (NLP)β14,354Oct 27, 2025Updated 4 months ago
- Minimal keyword extraction with BERTβ4,121Feb 3, 2026Updated last month
- Top2Vec learns jointly embedded topic, document and word vectors.β3,108Nov 14, 2024Updated last year
- SpikeX - SpaCy Pipes for Knowledge Extractionβ403Jul 30, 2021Updated 4 years ago
- A fast, efficient universal vector embedding utility package.β1,655Aug 3, 2023Updated 2 years ago
- Efficient few-shot learning with Sentence Transformersβ2,688Dec 11, 2025Updated 2 months ago
- Named Entity Recognition based on dictionariesβ240Mar 3, 2019Updated 7 years ago
- Data augmentation for NLPβ4,645Jun 24, 2024Updated last year
- State-of-the-Art Text Embeddingsβ18,323Feb 27, 2026Updated last week
- β70Nov 30, 2022Updated 3 years ago
- Fast & easy transfer learning for NLP. Harvesting language models for the industry. Focus on Question Answering.β1,752Dec 20, 2023Updated 2 years ago
- Information extraction from English and German texts based on predicate logicβ394Jul 8, 2022Updated 3 years ago
- Language-Agnostic SEntence Representationsβ3,659May 2, 2024Updated last year
- Leveraging BERT and c-TF-IDF to create easily interpretable topics.β7,426Feb 20, 2026Updated 2 weeks ago
- High-accuracy NLP parser with models for 11 languages.β907Jan 10, 2022Updated 4 years ago
- This repository contains an easy and intuitive approach to few-shot classification using sentence-transformers or spaCy models, or zero-sβ¦β220Jan 20, 2025Updated last year
- Single-document unsupervised keyword extractionβ1,825Feb 11, 2026Updated 3 weeks ago
- A python module for English lemmatization and inflection.β273Sep 14, 2023Updated 2 years ago
- A Python library for calculating a large variety of metrics from textβ360Jan 30, 2026Updated last month
- Trankit is a Light-Weight Transformer-based Python Toolkit for Multilingual Natural Language Processingβ790Jul 22, 2025Updated 7 months ago