segment-any-text / wtpsplitLinks
Toolkit to segment text into sentences or other semantic units in a robust, efficient and adaptable way.
☆1,091Updated 3 weeks ago
Alternatives and similar repositories for wtpsplit
Users that are interested in wtpsplit are comparing it to the libraries listed below
Sorting:
- 🐍💯pySBD (Python Sentence Boundary Disambiguation) is a rule-based sentence boundary detection that works out-of-the-box.☆863Updated 10 months ago
- NeuSpell: A Neural Spelling Correction Toolkit☆696Updated last year
- Fast lexical search implementing BM25 in Python using Numpy, Numba and Scipy☆1,242Updated last month
- Open neural machine translation models and web services☆707Updated last month
- Generalist and Lightweight Model for Named Entity Recognition (Extract any entity types from texts) @ NAACL 2024☆2,146Updated last week
- SpanMarker for Named Entity Recognition☆437Updated 6 months ago
- 🦙 Integrating LLMs into structured NLP pipelines☆1,279Updated 6 months ago
- Bringing BERT into modernity via both architecture changes and scaling☆1,442Updated 2 weeks ago
- String-to-String Algorithms for Natural Language Processing☆550Updated 11 months ago
- 80x faster and 95% accurate language identification with Fasttext☆158Updated last year
- Training open neural machine translation models☆367Updated 4 months ago
- ⚡ boost inference speed of T5 models by 5x & reduce the model size by 3x.☆581Updated 2 years ago
- State-of-the-art LLM-based translation models.☆542Updated 3 months ago
- Late Interaction Models Training & Retrieval☆481Updated last week
- SGPT: GPT Sentence Embeddings for Semantic Search☆868Updated last year
- SPLADE: sparse neural search (SIGIR21, SIGIR22)☆871Updated last year
- The most accurate natural language detection library for Python, suitable for short text and mixed-language text☆1,420Updated last month
- A Heterogeneous Benchmark for Information Retrieval. Easy to use, evaluate your models across 15+ diverse IR datasets.☆1,872Updated last month
- A lightweight, low-dependency, unified API to use all common reranking and cross-encoder models.☆1,499Updated last month
- Evaluate your speech-to-text system with similarity measures such as word error rate (WER)☆757Updated 5 months ago
- A Collection of BM25 Algorithms in Python☆1,208Updated 9 months ago
- Efficient few-shot learning with Sentence Transformers☆2,523Updated 3 months ago
- A very simple news crawler with a funny name☆390Updated this week
- Easily embed, cluster and semantically label text datasets☆556Updated last year
- Fast Semantic Text Deduplication & Filtering☆762Updated last month
- 💬 Language Identification with Support for More Than 2000 Labels -- EMNLP 2023☆144Updated last month
- ☆527Updated last year
- Trankit is a Light-Weight Transformer-based Python Toolkit for Multilingual Natural Language Processing☆757Updated 9 months ago
- ⚡️A Blazing-Fast Python Library for Ranking Evaluation, Comparison, and Fusion 🐍☆575Updated 2 weeks ago
- Text to sentence splitter using heuristic algorithm by Philipp Koehn and Josh Schroeder.☆248Updated 2 years ago