segment-any-text / wtpsplitLinks
Toolkit to segment text into sentences or other semantic units in a robust, efficient and adaptable way.
β1,197Updated last week
Alternatives and similar repositories for wtpsplit
Users that are interested in wtpsplit are comparing it to the libraries listed below
Sorting:
- ππ―pySBD (Python Sentence Boundary Disambiguation) is a rule-based sentence boundary detection that works out-of-the-box.β886Updated last year
- Fast lexical search implementing BM25 in Python using Numpy, Numba and Scipyβ1,403Updated last week
- NeuSpell: A Neural Spelling Correction Toolkitβ700Updated 2 years ago
- A Collection of BM25 Algorithms in Pythonβ1,270Updated last year
- SPLADE: sparse neural search (SIGIR21, SIGIR22)β948Updated last year
- π¦ Integrating LLMs into structured NLP pipelinesβ1,349Updated 10 months ago
- Generalist and Lightweight Model for Named Entity Recognition (Extract any entity types from texts) @ NAACL 2024β2,550Updated this week
- Bringing BERT into modernity via both architecture changes and scalingβ1,572Updated 5 months ago
- Training open neural machine translation modelsβ384Updated 8 months ago
- Lite & Super-fast re-ranking for your search & retrieval pipelines. Supports SoTA Listwise and Pairwise reranking based on LLMs and croβ¦β888Updated 2 months ago
- The most accurate natural language detection library for Python, suitable for short text and mixed-language textβ1,570Updated last week
- State-of-the-art LLM-based translation models.β566Updated 7 months ago
- SpanMarker for Named Entity Recognitionβ463Updated 10 months ago
- Open neural machine translation models and web servicesβ747Updated last week
- Late Interaction Models Training & Retrievalβ656Updated 2 weeks ago
- 80x faster and 95% accurate language identification with Fasttextβ162Updated last year
- A lightweight, low-dependency, unified API to use all common reranking and cross-encoder models.β1,575Updated 6 months ago
- A fast, lightweight and easy-to-use Python library for splitting text into semantically meaningful chunks.β430Updated last month
- SGPT: GPT Sentence Embeddings for Semantic Searchβ872Updated last year
- Train and Infer Powerful Sentence Embeddings with AnglE | π₯ SOTA on STS and MTEB Leaderboardβ560Updated last month
- String-to-String Algorithms for Natural Language Processingβ561Updated last year
- Fast, Accurate, Lightweight Python library to make State of the Art Embeddingβ2,524Updated this week
- Easily embed, cluster and semantically label text datasetsβ584Updated last year
- β‘οΈA Blazing-Fast Python Library for Ranking Evaluation, Comparison, and Fusion πβ621Updated 3 months ago
- A Heterogeneous Benchmark for Information Retrieval. Easy to use, evaluate your models across 15+ diverse IR datasets.β2,010Updated last month
- βοΈContextual word checker for better suggestions (not actively maintained)β418Updated 10 months ago
- Evaluate your speech-to-text system with similarity measures such as word error rate (WER)β820Updated 9 months ago
- β552Updated last year
- Trankit is a Light-Weight Transformer-based Python Toolkit for Multilingual Natural Language Processingβ774Updated 4 months ago
- Fast Semantic Text Deduplication & Filteringβ848Updated last month