segment-any-text / wtpsplitLinks
Toolkit to segment text into sentences or other semantic units in a robust, efficient and adaptable way.
β1,126Updated 2 months ago
Alternatives and similar repositories for wtpsplit
Users that are interested in wtpsplit are comparing it to the libraries listed below
Sorting:
- ππ―pySBD (Python Sentence Boundary Disambiguation) is a rule-based sentence boundary detection that works out-of-the-box.β871Updated last year
- Open neural machine translation models and web servicesβ717Updated 2 months ago
- NeuSpell: A Neural Spelling Correction Toolkitβ696Updated 2 years ago
- Fast lexical search implementing BM25 in Python using Numpy, Numba and Scipyβ1,291Updated 2 months ago
- π¦ Integrating LLMs into structured NLP pipelinesβ1,301Updated 7 months ago
- Generalist and Lightweight Model for Named Entity Recognition (Extract any entity types from texts) @ NAACL 2024β2,287Updated this week
- Training open neural machine translation modelsβ373Updated 5 months ago
- State-of-the-art LLM-based translation models.β551Updated 4 months ago
- Bringing BERT into modernity via both architecture changes and scalingβ1,497Updated 2 months ago
- Official implementation of the papers "GECToR β Grammatical Error Correction: Tag, Not Rewrite" (BEA-20) and "Text Simplification by Taggβ¦β933Updated last year
- SpanMarker for Named Entity Recognitionβ451Updated 7 months ago
- The most accurate natural language detection library for Python, suitable for short text and mixed-language textβ1,477Updated 2 months ago
- A Collection of BM25 Algorithms in Pythonβ1,229Updated 10 months ago
- SPLADE: sparse neural search (SIGIR21, SIGIR22)β893Updated last year
- 80x faster and 95% accurate language identification with Fasttextβ162Updated last year
- Easily embed, cluster and semantically label text datasetsβ566Updated last year
- SGPT: GPT Sentence Embeddings for Semantic Searchβ870Updated last year
- β‘ boost inference speed of T5 models by 5x & reduce the model size by 3x.β586Updated 2 years ago
- Open language modeling toolkit based on PyTorchβ143Updated last month
- A lightweight, low-dependency, unified API to use all common reranking and cross-encoder models.β1,523Updated 3 months ago
- Efficient few-shot learning with Sentence Transformersβ2,557Updated 3 weeks ago
- Late Interaction Models Training & Retrievalβ532Updated this week
- β540Updated last year
- Train and Infer Powerful Sentence Embeddings with AnglE | π₯ SOTA on STS and MTEB Leaderboardβ552Updated 5 months ago
- Software that makes labeling PDFs easy.β418Updated last year
- Python port of SymSpell: 1 million times faster spelling correction & fuzzy search through Symmetric Delete spelling correction algorithmβ¦β839Updated 2 weeks ago
- SONAR, a new multilingual and multimodal fixed-size sentence embedding space, with a full suite of speech and text encoders and decoders.β814Updated last month
- A Neural Framework for MT Evaluationβ646Updated 3 weeks ago
- A Heterogeneous Benchmark for Information Retrieval. Easy to use, evaluate your models across 15+ diverse IR datasets.β1,927Updated 2 months ago
- Lite & Super-fast re-ranking for your search & retrieval pipelines. Supports SoTA Listwise and Pairwise reranking based on LLMs and croβ¦β851Updated last month