BramVanroy / bicorpus-preprocessing
☆9Updated 4 years ago
Related projects ⓘ
Alternatives and complementary repositories for bicorpus-preprocessing
- An example of how to use spaCy for extremely large files without running into memory issues☆36Updated 2 years ago
- Tokenization across languages. Useful as preprocessing for subword tokenization.☆22Updated last year
- ☆70Updated last year
- Pyinfer is a model agnostic tool for ML developers and researchers to benchmark the inference statistics for machine learning models or f…☆24Updated 3 years ago
- ☆29Updated 2 years ago
- ☆13Updated 4 years ago
- Black for Python docstrings and reStructuredText (rst).☆16Updated last year
- Generic Environment for Context-Aware Correction of Orthography☆22Updated 2 years ago
- spaCy match and replace, maintaining conjugation☆34Updated last year
- A web interface to understand language-specific BERT-models☆17Updated 7 months ago
- Scripts supporting the development and serving the Roots Search Tool - https://hf.co/spaces/bigscience-data/roots-search☆10Updated last year
- ☆13Updated 3 years ago
- 🔎 A Prodigy plugin for evaluating spaCy pipelines☆12Updated 7 months ago
- ALMa (Active Learning Manager) Keeps track of labeled and unlabeled data for active learning☆42Updated 4 years ago
- Polyglot skipgram embeddings, and their many health benefits☆11Updated 4 years ago
- This is a document concerning Data Readiness in the context of machine learning and Natural Language Processing.☆11Updated 3 years ago
- A utility for labeling clusters of text data.☆28Updated 3 years ago
- Tooling to play around with multilingual machine translation for Indian Languages.☆21Updated 2 years ago
- OpenNeuroSpell contains parts of NeuroSpell (http://neurospell.com/en.php) released as open-source. More code will be published as soon a…☆20Updated 3 weeks ago
- A Python library for creating adversarial splits☆13Updated 2 years ago
- ✨ Web interface for NeuralCoref coreference resolution☆34Updated last year
- 🌸 Train floret vectors☆18Updated last year
- 🧬 A VS Code extension for annotating data with Prodigy☆30Updated 2 years ago
- This is the second part of the Deep Learning Course for the Master in High-Performance Computing (SISSA/ICTP).)☆33Updated 4 years ago
- Python bindings for Stanford CoreNLP's protobufs.☆20Updated 6 years ago
- A workflow system for Natural Language Processing.☆21Updated 5 years ago
- Official details for: [1803.08493] Context is Everything: Finding Meaning Statistically in Semantic Spaces☆39Updated 5 years ago
- An extension package of 🤗 Datasets that provides support for executing arbitrary SQL queries on HF datasets☆31Updated 9 months ago
- ☆22Updated 2 years ago
- A simple neural truecaser written in pytorch and allennlp.☆32Updated 5 months ago