segment-any-text / wtpsplitLinks
Toolkit to segment text into sentences or other semantic units in a robust, efficient and adaptable way.
β1,109Updated last month
Alternatives and similar repositories for wtpsplit
Users that are interested in wtpsplit are comparing it to the libraries listed below
Sorting:
- Fast lexical search implementing BM25 in Python using Numpy, Numba and Scipyβ1,267Updated 2 months ago
- ππ―pySBD (Python Sentence Boundary Disambiguation) is a rule-based sentence boundary detection that works out-of-the-box.β864Updated 11 months ago
- The most accurate natural language detection library for Python, suitable for short text and mixed-language textβ1,450Updated last month
- Open neural machine translation models and web servicesβ713Updated last month
- Bringing BERT into modernity via both architecture changes and scalingβ1,473Updated last month
- NeuSpell: A Neural Spelling Correction Toolkitβ696Updated 2 years ago
- SPLADE: sparse neural search (SIGIR21, SIGIR22)β884Updated last year
- Generalist and Lightweight Model for Named Entity Recognition (Extract any entity types from texts) @ NAACL 2024β2,206Updated this week
- Training open neural machine translation modelsβ369Updated 4 months ago
- State-of-the-art LLM-based translation models.β548Updated 3 months ago
- A Collection of BM25 Algorithms in Pythonβ1,217Updated 10 months ago
- π¦ Integrating LLMs into structured NLP pipelinesβ1,289Updated 7 months ago
- SpanMarker for Named Entity Recognitionβ444Updated 7 months ago
- SGPT: GPT Sentence Embeddings for Semantic Searchβ869Updated last year
- Easily embed, cluster and semantically label text datasetsβ560Updated last year
- Train and Infer Powerful Sentence Embeddings with AnglE | π₯ SOTA on STS and MTEB Leaderboardβ550Updated 4 months ago
- A lightweight, low-dependency, unified API to use all common reranking and cross-encoder models.β1,505Updated 2 months ago
- Late Interaction Models Training & Retrievalβ521Updated 3 weeks ago
- Efficient few-shot learning with Sentence Transformersβ2,534Updated 3 months ago
- String-to-String Algorithms for Natural Language Processingβ551Updated last year
- A Heterogeneous Benchmark for Information Retrieval. Easy to use, evaluate your models across 15+ diverse IR datasets.β1,908Updated 2 months ago
- β‘οΈA Blazing-Fast Python Library for Ranking Evaluation, Comparison, and Fusion πβ579Updated 2 weeks ago
- A Neural Framework for MT Evaluationβ642Updated this week
- β1,232Updated last year
- Lite & Super-fast re-ranking for your search & retrieval pipelines. Supports SoTA Listwise and Pairwise reranking based on LLMs and croβ¦β842Updated last month
- 80x faster and 95% accurate language identification with Fasttextβ160Updated last year
- Easy to use, state-of-the-art Neural Machine Translation for 100+ languagesβ1,240Updated last year
- All-in-one text de-duplicationβ706Updated 2 weeks ago
- Split text into semantic chunks, up to a desired chunk size. Supports calculating length by characters and tokens, and is callable from Rβ¦β461Updated this week
- β535Updated last year