dataiku / dss-plugin-nlp-preparation
Dataiku DSS plugin to detect languages, correct misspellings, and clean text data π§Ό
β22Updated last month
Alternatives and similar repositories for dss-plugin-nlp-preparation:
Users that are interested in dss-plugin-nlp-preparation are comparing it to the libraries listed below
- Crawling engine that crawls a set of top-level domains looking for documents in a list of languagesβ10Updated last year
- β30Updated 2 years ago
- An example of how to use spaCy for extremely large files without running into memory issuesβ36Updated 2 years ago
- Language detection using Spacy and Fasttextβ55Updated last year
- Pipeline component for spaCy (and other spaCy-wrapped parsers such as spacy-stanza and spacy-udpipe) that adds CoNLL-U properties to a Doβ¦β79Updated 7 months ago
- BERT models for many languages created from Wikipedia textsβ33Updated 4 years ago
- Many Natural Language Processing tasks rely on sentence boundary detection (SBD). Although amazing libraries like spacy provide state of β¦β61Updated 4 years ago
- In the wild extraction of entities that are found using Flair and displayed using a very elegant front-end.β71Updated 2 years ago
- spaCy match and replace, maintaining conjugationβ35Updated 2 years ago
- A small repository to test Captum Explainable AI with a trained Flair transformers-based text classifier.β26Updated 3 years ago
- Finds linguistic patterns effortlesslyβ35Updated last year
- Topic Inference with Zeroshot modelsβ61Updated last year
- β54Updated last year
- Tool to fix bitexts and tag near-duplicates for removalβ29Updated 2 weeks ago
- β17Updated last year
- spaCy-wrap is a wrapper library for spaCy for including fine-tuned transformers from Huggingface in your spaCy pipeline allowing you to iβ¦β46Updated 10 months ago
- A set of methods for finding an appropriate number of topics in a text collectionβ15Updated 6 months ago
- simple rule based named entity recognitionβ43Updated 3 years ago
- Bilingual sentence similarity classifier using Tensorflowβ20Updated 5 years ago
- Legal document similarity - Code, data, and models for the ICAIL 2021 paper "Evaluating Document Representations for Content-based Legal β¦β31Updated 3 years ago
- GrammarTagger β A Neural Multilingual Grammar Profiler for Language Learningβ27Updated 3 years ago
- semantically distinct key phrase extraction using hilbert hashes.β48Updated 2 years ago
- Tool for parsing and converting various span encoding schemes.β22Updated last year
- These are lists for a variety of languages containing words that are distinctive to each language.β35Updated 2 years ago
- Using short models to classify long textsβ21Updated last year
- An asynchronous concurrent pipeline for classifying Common Crawl based on fastText's pipeline.β86Updated 3 years ago
- A tool for correcting misspellings in textual input using the Noisy Channel Model.β11Updated 4 years ago
- A python module to process data for Frame Semantic Parsingβ23Updated 4 years ago
- Code accompanying the submission "Structural Text Segmentation of Legal Documents" by Aumiller et al.β96Updated last year
- Preprocessing and analysis for training SNOMED-CT concept embeddings from CORD-19 corpusβ14Updated last year