Toolkit to segment text into sentences or other semantic units in a robust, efficient and adaptable way.
β1,259Feb 26, 2026Updated 3 weeks ago
Alternatives and similar repositories for wtpsplit
Users that are interested in wtpsplit are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- A sentence segmenter that actually works!β304Aug 18, 2020Updated 5 years ago
- ππ―pySBD (Python Sentence Boundary Disambiguation) is a rule-based sentence boundary detection that works out-of-the-box.β908Aug 20, 2024Updated last year
- βοΈ Sentence segmentation with wtpsplit's state-of-the-art Segment any Text (SaT) modelsβ37Oct 1, 2025Updated 5 months ago
- Efficient few-shot learning with Sentence Transformersβ2,699Dec 11, 2025Updated 3 months ago
- Fast lexical search implementing BM25 in Pythonβ1,589Updated this week
- General-Purpose Neural Networks for Sentence Boundary Detectionβ73Mar 27, 2023Updated 2 years ago
- Text tokenization and sentence segmentation (segtok v2)β209Mar 12, 2022Updated 4 years ago
- The Batched API provides a flexible and efficient way to process multiple requests in a batch, with a primary focus on dynamic batching oβ¦β159Jul 14, 2025Updated 8 months ago
- State-of-the-Art Text Embeddingsβ18,427Mar 12, 2026Updated last week
- skweak: A software toolkit for weak supervision applied to NLP tasksβ926Sep 2, 2024Updated last year
- Generalist and Lightweight Model for Named Entity Recognition (Extract any entity types from texts) @ NAACL 2024β2,915Mar 14, 2026Updated last week
- Leveraging BERT and c-TF-IDF to create easily interpretable topics.β7,452Feb 20, 2026Updated last month
- Trankit is a Light-Weight Transformer-based Python Toolkit for Multilingual Natural Language Processingβ793Jul 22, 2025Updated 8 months ago
- SONAR, a new multilingual and multimodal fixed-size sentence embedding space, with a full suite of speech and text encoders and decoders.β876Oct 10, 2025Updated 5 months ago
- π‘ All-in-one AI framework for semantic search, LLM orchestration and language model workflowsβ12,291Mar 14, 2026Updated last week
- Easily use and train state of the art late-interaction retrieval methods (ColBERT) in any RAG pipeline. Designed for modularity and ease-β¦β3,882May 17, 2025Updated 10 months ago
- A blazing fast inference solution for text embeddings modelsβ4,600Mar 13, 2026Updated last week
- Fast inference engine for Transformer modelsβ4,368Feb 4, 2026Updated last month
- Structured Outputsβ13,564Mar 9, 2026Updated 2 weeks ago
- Fast, Accurate, Lightweight Python library to make State of the Art Embeddingβ2,791Mar 12, 2026Updated last week
- Argilla is a collaboration tool for AI engineers and domain experts to build high-quality datasetsβ4,896Updated this week
- A tokenizer and sentence splitter for German and English web and social media texts.β153Dec 9, 2024Updated last year
- Onset-and-Offset-Aware Sound Event Detectionβ21Feb 10, 2025Updated last year
- Speech-To-Text forced-alignment Speech processing Universal PERformance Benchmarkβ36May 7, 2025Updated 10 months ago
- Punctuation restoration and spell correction experiments.β253Feb 25, 2021Updated 5 years ago
- Efficient, scalable and enterprise-grade CPU/GPU inference server for π€ Hugging Face transformer models πβ1,687Oct 23, 2024Updated last year
- Pyserini is a Python toolkit for reproducible information retrieval research with sparse and dense representations.β2,036Mar 9, 2026Updated last week
- A python package to run contextualized topic modeling. CTMs combine contextualized embeddings (e.g., BERT) with topic models to get coherβ¦β1,266Jul 24, 2025Updated 7 months ago
- Efficient Language Model Training through Cross-Lingual and Progressive Transfer Learningβ30Jan 25, 2023Updated 3 years ago
- A model that predicts the punctuation of English, Italian, French and German texts.β84Feb 22, 2023Updated 3 years ago
- Distilabel is a framework for synthetic data and AI feedback for engineers who need fast, reliable and scalable pipelines based on verifiβ¦β3,131Updated this week
- Minimal keyword extraction with BERTβ4,131Feb 3, 2026Updated last month
- Data augmentation for NLPβ4,652Jun 24, 2024Updated last year
- A Python library for calculating a large variety of metrics from textβ361Jan 30, 2026Updated last month
- Zero and Few shot named entity & relationships recognitionβ402Sep 17, 2025Updated 6 months ago
- Obtain Word Alignments using Pretrained Language Models (e.g., mBERT)β392Nov 7, 2023Updated 2 years ago
- β30Jun 23, 2022Updated 3 years ago
- Repository accompanying "An Open Dataset and Model for Language Identification" (Burchell et al., 2023)β74Apr 1, 2025Updated 11 months ago
- A simple command line tool to calculate WER for ASR.β14Oct 14, 2024Updated last year