dataiku / dss-plugin-nlp-preparationLinks
Dataiku DSS plugin to detect languages, correct misspellings, and clean text data π§Ό
β22Updated 9 months ago
Alternatives and similar repositories for dss-plugin-nlp-preparation
Users that are interested in dss-plugin-nlp-preparation are comparing it to the libraries listed below
Sorting:
- In the wild extraction of entities that are found using Flair and displayed using a very elegant front-end.β71Updated 2 years ago
- β140Updated last year
- semantically distinct key phrase extraction using hilbert hashes.β50Updated 3 years ago
- π₯ Use Hugging Face text and token classification pipelines directly in spaCyβ63Updated last year
- Post-processing OCR errors with seq2seq modelsβ28Updated 5 years ago
- Many Natural Language Processing tasks rely on sentence boundary detection (SBD). Although amazing libraries like spacy provide state of β¦β61Updated 5 years ago
- A Domain Specific Language (DSL) for building language patterns. These can be later compiled into spaCy patterns, pure regex, or any otheβ¦β68Updated 3 years ago
- Code accompanying the submission "Structural Text Segmentation of Legal Documents" by Aumiller et al.β98Updated 2 years ago
- Finds linguistic patterns effortlesslyβ38Updated 2 years ago
- Named entity recognition for the legal domainβ42Updated 4 years ago
- Sentence transformers models for SpaCyβ109Updated 2 years ago
- β69Updated 3 years ago
- π§ͺ Cutting-edge experimental spaCy components and featuresβ103Updated last year
- π« SpaCy wrapper for ConceptNet π«β95Updated 2 years ago
- Augmenty is an augmentation library based on spaCy for augmenting texts.β156Updated last year
- spaCy match and replace, maintaining conjugationβ35Updated 2 years ago
- β55Updated last year
- Fast edit distance Python extension written in Cython/C++. Supports Levenshtein distance and Damerau Optimal String Alignment (OSA) distaβ¦β24Updated 5 months ago
- Source code for the paper "Post-OCR Document Correction with Large Ensembles of Character Sequence-to-Sequence Models"β37Updated last year
- Python package for deduplication/entity resolution using active learningβ82Updated last year
- A spaCy wrapper of OpenTapioca for named entity linking on Wikidataβ94Updated 2 years ago
- Asent is a python library for performing efficient and transparent sentiment analysis using spaCy.β120Updated 3 weeks ago
- Extract dates from textβ65Updated 4 years ago
- A Flexible Deep Learning Approach to Fuzzy String Matchingβ148Updated last year
- simple rule based named entity recognitionβ42Updated 3 years ago
- Pipeline component for spaCy (and other spaCy-wrapped parsers such as spacy-stanza and spacy-udpipe) that adds CoNLL-U properties to a Doβ¦β81Updated last year
- Storage and retrieval of Word Embeddings in various databasesβ51Updated 7 years ago
- π Additional lookup tables and data resources for spaCyβ112Updated 5 months ago
- Data and evaluation code for the paper WikiNEuRal: Combined Neural and Knowledge-based Silver Data Creation for Multilingual NER (EMNLP 2β¦β69Updated 2 years ago
- A spaCy custom component that extracts and normalizes temporal expressionsβ55Updated 2 years ago