kootenpv / tok
Fast and customizable tokenization
β64Updated 5 years ago
Alternatives and similar repositories for tok:
Users that are interested in tok are comparing it to the libraries listed below
- Find strings/words in text; convenience and C speedβ126Updated 2 years ago
- Textpipe: clean and extract metadata from textβ302Updated 3 years ago
- Lightning Fast Language Prediction πβ166Updated 6 years ago
- ALMa (Active Learning Manager) Keeps track of labeled and unlabeled data for active learningβ41Updated 4 years ago
- A fast and memory-optimized string library for heavy-text manipulation in Pythonβ250Updated 4 years ago
- A Lightweight NLP Data Loader for All Deep Learning Frameworks in Pythonβ181Updated last year
- A lightweight but powerful library to build token indices for NLP tasks, compatible with major Deep Learning frameworks like PyTorch and β¦β51Updated 3 months ago
- An easy to use open-source library for advanced Deep Learning and Natural Language Processingβ112Updated 7 months ago
- numeric fused-head identification and resolutionβ33Updated 5 years ago
- Code and data accompanying the paper "Approaching nested named entity recognition with parallel LSTM-CRFs."β26Updated 2 years ago
- Example using Polyaxon to experiment with pre-training spaCyβ65Updated 3 years ago
- Enso: An Open Source Library for Benchmarking Embeddings + Transfer Learning Methodsβ95Updated 4 years ago
- Official details for: [1803.08493] Context is Everything: Finding Meaning Statistically in Semantic Spacesβ39Updated 5 years ago
- A python module for word inflections designed for use with spaCy.β92Updated 5 years ago
- allennlp + streamlit demoβ22Updated 5 years ago
- This repository contains code to replicate the no-longer publicly available Toronto BookCorpus datasetβ49Updated 2 years ago
- Jupyter extension to visualize dependency structuresβ28Updated 6 years ago
- β70Updated 2 years ago
- spaCy + UDPipeβ161Updated 2 years ago
- Easy-to-use text representations extraction library based on the Transformers library.β32Updated 2 years ago
- An asynchronous concurrent pipeline for classifying Common Crawl based on fastText's pipeline.β86Updated 3 years ago
- A Domain Specific Language (DSL) for building language patterns. These can be later compiled into spaCy patterns, pure regex, or any otheβ¦β67Updated 2 years ago
- Polyglot skipgram embeddings, and their many health benefitsβ12Updated 5 years ago
- A fully customisable language detection pipeline for spaCyβ92Updated 5 years ago
- Hunspell extension for spaCy 2.0.β94Updated 7 months ago
- Interactive Model Iteration with Weak Supervision and Pre-Trained Embeddingsβ77Updated 2 years ago
- Scripts as a service. Builds on systemd (for Linux)β20Updated last year
- NER, syntax markup visualizationsβ138Updated last year
- Python stream processing for humansβ185Updated last month
- Language detection extension for spaCy 2.0+β112Updated 6 years ago