alea-institute / nupunkt
Next-generation Punkt sentence boundary detection with zero dependencies
☆16Updated last month
Alternatives and similar repositories for nupunkt:
Users that are interested in nupunkt are comparing it to the libraries listed below
- Tool to apply Legal Matter Specification Standard (LMSS) to documents☆13Updated 8 months ago
- Small python package to measure OCR quality and other related metrics.☆21Updated last year
- SpaCyEx allows the creation of spaCy Matcher patterns with RegEx like syntax.☆59Updated last year
- spaCy entry points for Curated Transformers☆29Updated 7 months ago
- Code for SaGe subword tokenizer (EACL 2023)☆24Updated 5 months ago
- Fastlaw's purpose is to replace generic word embeddings for work on supervised machine learning NLP-tasks with legal texts.☆38Updated 5 years ago
- 💥 Use Hugging Face text and token classification pipelines directly in spaCy☆63Updated last year
- 🔢 Work with static vector models☆28Updated 2 weeks ago
- ☆54Updated last year
- ☆67Updated last year
- Bagpipes spaCy is a collection of custom spaCy pipeline components designed to enhance text processing capabilities.☆17Updated 8 months ago
- Plug-and-play document processing pipelines. No training. Batteries included.☆57Updated last week
- 🌸 Train floret vectors☆18Updated 2 years ago
- Efficient few-shot learning with cross-encoders.☆51Updated last year
- ☆23Updated 3 months ago
- Trully flash implementation of DeBERTa disentangled attention mechanism.☆46Updated 3 weeks ago
- A simple library for segmenting legal texts☆15Updated 2 years ago
- Source code and data for Like a Good Nearest Neighbor☆28Updated 3 months ago
- ☆30Updated 2 years ago
- A BERT-based application for reusable text classification at scale☆38Updated last year
- 🤗 HuggingFace Inference Toolkit for Google Cloud Vertex AI (similar to SageMaker's Inference Toolkit, but for Vertex AI and unofficial)☆17Updated last year
- Python library to use Pleias-RAG models☆36Updated this week
- Tokenization across languages. Useful as preprocessing for subword tokenization.☆22Updated 2 years ago
- spaCy extension for Visual Studio Code☆30Updated last month
- ☆18Updated 3 years ago
- Pytorch implementation of a BiLSTM model for the Wikification project.☆19Updated 5 years ago
- CLI that queries multiple language models in parallel using prompts from a CSV file☆26Updated this week
- NLP with Rust for Python 🦀🐍☆62Updated 11 months ago
- Metadata Extractor & Loader (MEL) ■ The NLP-NER Toolkit (TNNT)☆23Updated 2 years ago
- API client for fetching and comparing passages from legislation☆11Updated 3 months ago