alea-institute / nupunktLinks
Next-generation Punkt sentence boundary detection with zero dependencies
β24Updated 3 months ago
Alternatives and similar repositories for nupunkt
Users that are interested in nupunkt are comparing it to the libraries listed below
Sorting:
- SpaCyEx allows the creation of spaCy Matcher patterns with RegEx like syntax.β59Updated last year
- π₯ Use Hugging Face text and token classification pipelines directly in spaCyβ63Updated last year
- Small python package to measure OCR quality and other related metrics.β25Updated last year
- β55Updated last year
- πΈ Train floret vectorsβ18Updated 2 years ago
- spaCy entry points for Curated Transformersβ32Updated 5 months ago
- Bagpipes spaCy is a collection of custom spaCy pipeline components designed to enhance text processing capabilities.β21Updated last year
- Tool to apply Legal Matter Specification Standard (LMSS) to documentsβ12Updated last year
- Code for SaGe subword tokenizer (EACL 2023)β27Updated 11 months ago
- β30Updated 3 years ago
- Generate reports for spaCy models.β29Updated 3 years ago
- This is a prototype of a multi-lingual suite for named-entity recognition in Python.β21Updated last year
- Language detection using Spacy and Fasttextβ57Updated last year
- Fastlaw's purpose is to replace generic word embeddings for work on supervised machine learning NLP-tasks with legal texts.β40Updated 6 years ago
- Named entity recognition for the legal domainβ42Updated 4 years ago
- An example of how to use spaCy for extremely large files without running into memory issuesβ36Updated 3 years ago
- SMASHED is a toolkit designed to apply transformations to samples in datasets, such as fields extraction, tokenization, prompting, batchiβ¦β35Updated last year
- β69Updated 3 years ago
- Library for fast text representation and classification.β31Updated last year
- Tokenization across languages. Useful as preprocessing for subword tokenization.β21Updated 2 years ago
- β70Updated 2 years ago
- Python based Wikidata framework for easy dataframe extractionβ45Updated last year
- π Make Thinc faster on macOS by calling into Apple's native Accelerate libraryβ101Updated 4 months ago
- β19Updated 4 years ago
- β67Updated last year
- spaCy pipeline component for generating spaCy KnowledgeBase Alias Candidates for Entity Linkingβ86Updated 3 years ago
- spaCy match and replace, maintaining conjugationβ35Updated 2 years ago
- Source code and data for Like a Good Nearest Neighborβ30Updated 10 months ago
- It's a cooler way to store simple linear models.β27Updated last year
- Enhaced version of Wikiextrator: A wikipedia dumps extractorβ24Updated last month