segment-any-text / wtpsplitLinks
Toolkit to segment text into sentences or other semantic units in a robust, efficient and adaptable way.
☆1,179Updated 3 weeks ago
Alternatives and similar repositories for wtpsplit
Users that are interested in wtpsplit are comparing it to the libraries listed below
Sorting:
- 🐍💯pySBD (Python Sentence Boundary Disambiguation) is a rule-based sentence boundary detection that works out-of-the-box.☆878Updated last year
- Fast lexical search implementing BM25 in Python using Numpy, Numba and Scipy☆1,368Updated last month
- The most accurate natural language detection library for Python, suitable for short text and mixed-language text☆1,539Updated last week
- Open neural machine translation models and web services☆739Updated 4 months ago
- NeuSpell: A Neural Spelling Correction Toolkit☆697Updated 2 years ago
- State-of-the-art LLM-based translation models.☆558Updated 6 months ago
- 🦙 Integrating LLMs into structured NLP pipelines☆1,328Updated 9 months ago
- Easily embed, cluster and semantically label text datasets☆581Updated last year
- Efficient few-shot learning with Sentence Transformers☆2,587Updated 2 months ago
- SpanMarker for Named Entity Recognition☆460Updated 9 months ago
- Bringing BERT into modernity via both architecture changes and scaling☆1,549Updated 4 months ago
- 80x faster and 95% accurate language identification with Fasttext☆161Updated last year
- ⚡️ 80x faster Fasttext language detection out of the box | Split text by language☆253Updated last month
- A Collection of BM25 Algorithms in Python☆1,253Updated last year
- SGPT: GPT Sentence Embeddings for Semantic Search☆875Updated last year
- Evaluate your speech-to-text system with similarity measures such as word error rate (WER)☆814Updated 8 months ago
- Generalist and Lightweight Model for Named Entity Recognition (Extract any entity types from texts) @ NAACL 2024☆2,459Updated last week
- 📚 Process PDFs, Word documents and more with spaCy☆784Updated 7 months ago
- Open language modeling toolkit based on PyTorch☆152Updated 2 weeks ago
- Training open neural machine translation models☆380Updated 7 months ago
- FastFit ⚡ When LLMs are Unfit Use FastFit ⚡ Fast and Effective Text Classification with Many Classes☆212Updated last month
- String-to-String Algorithms for Natural Language Processing☆556Updated last year
- Extract structured text from pdfs quickly☆614Updated 4 months ago
- A very simple news crawler with a funny name☆415Updated this week
- ☆548Updated last year
- SPLADE: sparse neural search (SIGIR21, SIGIR22)☆931Updated last year
- 💬 Language Identification with Support for More Than 2000 Labels -- EMNLP 2023☆163Updated 4 months ago
- ⚡ boost inference speed of T5 models by 5x & reduce the model size by 3x.☆587Updated 2 years ago
- Late Interaction Models Training & Retrieval☆632Updated this week
- Multilingual sentence alignment using sentence embeddings☆127Updated 11 months ago