Toolkit to segment text into sentences or other semantic units in a robust, efficient and adaptable way.
β1,299Apr 11, 2026Updated 2 months ago
Alternatives and similar repositories for wtpsplit
Users that are interested in wtpsplit are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- A sentence segmenter that actually works!β304Aug 18, 2020Updated 5 years ago
- ππ―pySBD (Python Sentence Boundary Disambiguation) is a rule-based sentence boundary detection that works out-of-the-box.β917Aug 20, 2024Updated last year
- βοΈ Sentence segmentation with wtpsplit's state-of-the-art Segment any Text (SaT) modelsβ39May 2, 2026Updated last month
- Efficient few-shot learning with Sentence Transformersβ2,746May 26, 2026Updated 2 weeks ago
- Fast BM25 search in Python, powered by Numpy and Numbaβ1,703Updated this week
- End-to-end encrypted email - Proton Mail β’ AdSpecial offer: 40% Off Yearly / 80% Off First Month. All Proton services are open source and independently audited for security.
- General-Purpose Neural Networks for Sentence Boundary Detectionβ73Mar 27, 2023Updated 3 years ago
- Text tokenization and sentence segmentation (segtok v2)β209Mar 12, 2022Updated 4 years ago
- The Batched API provides a flexible and efficient way to process multiple requests in a batch, with a primary focus on dynamic batching oβ¦β161Jul 14, 2025Updated 11 months ago
- State-of-the-Art Embeddings, Retrieval, and Rerankingβ18,805Updated this week
- skweak: A software toolkit for weak supervision applied to NLP tasksβ926Sep 2, 2024Updated last year
- Generalist and Lightweight Model for Named Entity Recognition (Extract any entity types from texts)β3,263Jun 2, 2026Updated 2 weeks ago
- Leveraging BERT and c-TF-IDF to create easily interpretable topics.β7,671May 13, 2026Updated last month
- Trankit is a Light-Weight Transformer-based Python Toolkit for Multilingual Natural Language Processingβ795Jul 22, 2025Updated 10 months ago
- SONAR, a new multilingual and multimodal fixed-size sentence embedding space, with a full suite of speech and text encoders and decoders.β896Oct 10, 2025Updated 8 months ago
- GPU virtual machines on DigitalOcean Gradient AI β’ AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- π‘ All-in-one AI framework for semantic search, LLM orchestration and language model workflowsβ12,642Jun 8, 2026Updated last week
- Easily use and train state of the art late-interaction retrieval methods (ColBERT) in any RAG pipeline. Designed for modularity and ease-β¦β3,935May 17, 2025Updated last year
- Fast inference engine for Transformer modelsβ4,517Jun 7, 2026Updated last week
- Argilla is a collaboration tool for AI engineers and domain experts to build high-quality datasetsβ4,996Jun 8, 2026Updated last week
- A lightweight, low-dependency, unified API to use all common reranking and cross-encoder models.β1,621Dec 20, 2025Updated 5 months ago
- A blazing fast inference solution for text embeddings modelsβ4,861May 26, 2026Updated 2 weeks ago
- Fast, Accurate, Lightweight Python library to make State of the Art Embeddingβ3,033Updated this week
- Structured Outputsβ13,947May 18, 2026Updated 3 weeks ago
- A tokenizer and sentence splitter for German and English web and social media texts.β152Dec 9, 2024Updated last year
- Deploy to Railway using AI coding agents - Free Credits Offer β’ AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- Onset-and-Offset-Aware Sound Event Detectionβ21Feb 10, 2025Updated last year
- Speech-To-Text forced-alignment Speech processing Universal PERformance Benchmarkβ38May 7, 2025Updated last year
- Punctuation restoration and spell correction experiments.β254Feb 25, 2021Updated 5 years ago
- Efficient, scalable and enterprise-grade CPU/GPU inference server for π€ Hugging Face transformer models πβ1,687Oct 23, 2024Updated last year
- A python package to run contextualized topic modeling. CTMs combine contextualized embeddings (e.g., BERT) with topic models to get coherβ¦β1,270Jul 24, 2025Updated 10 months ago
- Pyserini is a Python toolkit for reproducible information retrieval research with sparse and dense representations.β2,086Jun 8, 2026Updated last week
- Efficient Language Model Training through Cross-Lingual and Progressive Transfer Learningβ30Jan 25, 2023Updated 3 years ago
- A model that predicts the punctuation of English, Italian, French and German texts.β89Apr 21, 2026Updated last month
- Minimal keyword extraction with BERTβ4,187May 13, 2026Updated last month
- Proton VPN Special Offer - Get 70% off β’ AdSpecial partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
- Data augmentation for NLPβ4,658Updated this week
- Distilabel is a framework for synthetic data and AI feedback for engineers who need fast, reliable and scalable pipelines based on verifiβ¦β3,251Jun 8, 2026Updated last week
- A Python library for calculating a large variety of metrics from textβ366May 5, 2026Updated last month
- Zero and Few shot named entity & relationships recognitionβ401Sep 17, 2025Updated 8 months ago
- β29Jun 23, 2022Updated 3 years ago
- Active Learning for Text Classification in Pythonβ644May 24, 2026Updated 3 weeks ago
- [EMNLP 2020] Obtain Word Alignments using Pretrained Language Models (e.g., mBERT)β397Nov 7, 2023Updated 2 years ago