alea-institute / nupunktLinks
Next-generation Punkt sentence boundary detection with zero dependencies
☆17Updated 3 months ago
Alternatives and similar repositories for nupunkt
Users that are interested in nupunkt are comparing it to the libraries listed below
Sorting:
- Small python package to measure OCR quality and other related metrics.☆25Updated last year
- A simple library for segmenting legal texts☆17Updated 2 years ago
- Tool to apply Legal Matter Specification Standard (LMSS) to documents☆13Updated 11 months ago
- 💥 Use Hugging Face text and token classification pipelines directly in spaCy☆63Updated last year
- Fastlaw's purpose is to replace generic word embeddings for work on supervised machine learning NLP-tasks with legal texts.☆38Updated 6 years ago
- SpaCyEx allows the creation of spaCy Matcher patterns with RegEx like syntax.☆59Updated last year
- 🍏 Make Thinc faster on macOS by calling into Apple's native Accelerate library☆98Updated last month
- ☆55Updated last year
- This is a prototype of a multi-lingual suite for named-entity recognition in Python.☆21Updated last year
- 🌸 Train floret vectors☆18Updated 2 years ago
- 📜 Dehyphenation of broken text (mainly German), i.e., extracted from a PDF☆39Updated 3 years ago
- Named entity recognition for the legal domain☆42Updated 4 years ago
- ☆18Updated 4 years ago
- Python based Wikidata framework for easy dataframe extraction☆45Updated last year
- Language detection using Spacy and Fasttext☆57Updated last year
- A python package to simulate typographical errors.☆36Updated last year
- spaCy entry points for Curated Transformers☆32Updated 2 months ago
- An open-source package for python to clean raw text data☆70Updated last year
- ☆30Updated 3 years ago
- A spaCy wrapper of Entity-Fishing (component) for named entity disambiguation and linking on Wikidata☆164Updated 2 years ago
- Mining Legal Arguments in Court Decisions - Data and software☆68Updated 2 years ago
- Python package for deduplication/entity resolution using active learning☆81Updated 11 months ago
- Code for SaGe subword tokenizer (EACL 2023)☆25Updated 8 months ago
- Write Datasette canned queries as plain SQL files☆14Updated 3 years ago
- scraping and querying documents for LLMs☆23Updated 2 months ago
- LegalCrawler: A tool for automated scraping of English legal corpora☆54Updated 2 years ago
- A Datasette plugin providing an MLOps platform to train, eval and predict machine learning models☆16Updated 2 weeks ago
- spaCy match and replace, maintaining conjugation☆35Updated 2 years ago
- Bagpipes spaCy is a collection of custom spaCy pipeline components designed to enhance text processing capabilities.☆18Updated 11 months ago
- ☆17Updated 2 years ago