segment-any-text / wtpsplitLinks
Toolkit to segment text into sentences or other semantic units in a robust, efficient and adaptable way.
β1,051Updated 2 months ago
Alternatives and similar repositories for wtpsplit
Users that are interested in wtpsplit are comparing it to the libraries listed below
Sorting:
- ππ―pySBD (Python Sentence Boundary Disambiguation) is a rule-based sentence boundary detection that works out-of-the-box.β851Updated 9 months ago
- Fast lexical search implementing BM25 in Python using Numpy, Numba and Scipyβ1,193Updated this week
- π¦ Integrating LLMs into structured NLP pipelinesβ1,254Updated 4 months ago
- Efficient few-shot learning with Sentence Transformersβ2,486Updated last month
- Bringing BERT into modernity via both architecture changes and scalingβ1,385Updated 3 weeks ago
- A lightweight, low-dependency, unified API to use all common reranking and cross-encoder models.β1,436Updated last week
- NeuSpell: A Neural Spelling Correction Toolkitβ695Updated last year
- Generalist and Lightweight Model for Named Entity Recognition (Extract any entity types from texts) @ NAACL 2024β2,053Updated last week
- Open neural machine translation models and web servicesβ696Updated 5 months ago
- Neural Searchβ357Updated 2 months ago
- SpanMarker for Named Entity Recognitionβ431Updated 4 months ago
- Fast inference engine for Transformer modelsβ3,831Updated last month
- Easily embed, cluster and semantically label text datasetsβ542Updated last year
- Train and Infer Powerful Sentence Embeddings with AnglE | π₯ SOTA on STS and MTEB Leaderboardβ543Updated 2 months ago
- Fast Semantic Text Deduplication & Filteringβ697Updated last week
- Evaluate your speech-to-text system with similarity measures such as word error rate (WER)β729Updated 3 months ago
- Lite & Super-fast re-ranking for your search & retrieval pipelines. Supports SoTA Listwise and Pairwise reranking based on LLMs and croβ¦β809Updated 6 months ago
- SGPT: GPT Sentence Embeddings for Semantic Searchβ868Updated last year
- State-of-the-art LLM-based translation models.β530Updated last month
- A Heterogeneous Benchmark for Information Retrieval. Easy to use, evaluate your models across 15+ diverse IR datasets.β1,824Updated this week
- Obtain Word Alignments using Pretrained Language Models (e.g., mBERT)β361Updated last year
- β521Updated 10 months ago
- β‘ boost inference speed of T5 models by 5x & reduce the model size by 3x.β578Updated 2 years ago
- Late Interaction Models Training & Retrievalβ395Updated last week
- Library for translating between 200 languages. Built on π€ transformers.β480Updated 9 months ago
- π Process PDFs, Word documents and more with spaCyβ615Updated 2 months ago
- Things you can do with the token embeddings of an LLMβ1,443Updated 2 months ago
- A fast, lightweight and easy-to-use Python library for splitting text into semantically meaningful chunks.β315Updated 2 months ago
- β363Updated last year
- BLEURT is a metric for Natural Language Generation based on transfer learning.β733Updated last year