segment-any-text / wtpsplit
Toolkit to segment text into sentences or other semantic units in a robust, efficient and adaptable way.
β1,032Updated last month
Alternatives and similar repositories for wtpsplit
Users that are interested in wtpsplit are comparing it to the libraries listed below
Sorting:
- ππ―pySBD (Python Sentence Boundary Disambiguation) is a rule-based sentence boundary detection that works out-of-the-box.β850Updated 8 months ago
- Fast lexical search implementing BM25 in Python using Numpy, Numba and Scipyβ1,153Updated 3 weeks ago
- SpanMarker for Named Entity Recognitionβ429Updated 4 months ago
- Bringing BERT into modernity via both architecture changes and scalingβ1,358Updated this week
- Efficient few-shot learning with Sentence Transformersβ2,479Updated last month
- A Collection of BM25 Algorithms in Pythonβ1,164Updated 7 months ago
- β‘ boost inference speed of T5 models by 5x & reduce the model size by 3x.β578Updated 2 years ago
- NeuSpell: A Neural Spelling Correction Toolkitβ694Updated last year
- The most accurate natural language detection library for Python, suitable for short text and mixed-language textβ1,345Updated last month
- Neural Searchβ355Updated 2 months ago
- A Heterogeneous Benchmark for Information Retrieval. Easy to use, evaluate your models across 15+ diverse IR datasets.β1,800Updated 2 months ago
- β516Updated 10 months ago
- SGPT: GPT Sentence Embeddings for Semantic Searchβ866Updated last year
- Obtain Word Alignments using Pretrained Language Models (e.g., mBERT)β360Updated last year
- Generalist and Lightweight Model for Named Entity Recognition (Extract any entity types from texts) @ NAACL 2024β2,001Updated last week
- Tools to download and cleanup Common Crawl dataβ1,007Updated 2 years ago
- 80x faster and 95% accurate language identification with Fasttextβ153Updated last year
- π¦ Integrating LLMs into structured NLP pipelinesβ1,245Updated 4 months ago
- Fast inference engine for Transformer modelsβ3,797Updated last month
- SPLADE: sparse neural search (SIGIR21, SIGIR22)β842Updated last year
- A lightweight, low-dependency, unified API to use all common reranking and cross-encoder models.β1,411Updated last week
- Easily embed, cluster and semantically label text datasetsβ534Updated last year
- Language model fine-tuning on NER with an easy interface and cross-domain evaluation. "T-NER: An All-Round Python Library for Transformerβ¦β388Updated 2 years ago
- A modern, interlingual wordnet interface for Pythonβ243Updated last week
- Code for 'LLM2Vec: Large Language Models Are Secretly Powerful Text Encoders'β1,510Updated 3 months ago
- Text to sentence splitter using heuristic algorithm by Philipp Koehn and Josh Schroeder.β246Updated 2 years ago
- Whisper with Medusa headsβ833Updated 2 weeks ago
- Fast, Accurate, Lightweight Python library to make State of the Art Embeddingβ2,044Updated this week
- Trankit is a Light-Weight Transformer-based Python Toolkit for Multilingual Natural Language Processingβ753Updated 7 months ago
- π Process PDFs, Word documents and more with spaCyβ589Updated 2 months ago