explosion / curated-tokenizers
Lightweight piece tokenization library
☆12Updated 11 months ago
Alternatives and similar repositories for curated-tokenizers:
Users that are interested in curated-tokenizers are comparing it to the libraries listed below
- Library for fast text representation and classification.☆28Updated last year
- NLP tasks with zero- and few-shot models.☆14Updated this week
- spaCy-wrap is a wrapper library for spaCy for including fine-tuned transformers from Huggingface in your spaCy pipeline allowing you to i…☆46Updated 11 months ago
- A Python library aimed at dissecting and augmenting NER training data.☆58Updated last year
- Execute arbitrary SQL queries on 🤗 Datasets☆32Updated last year
- A spaCy custom component that extracts and normalizes temporal expressions☆54Updated 2 years ago
- Code for SaGe subword tokenizer (EACL 2023)☆24Updated 4 months ago
- KIND: an Italian Multi-Domain Dataset for Named Entity Recognition☆15Updated last year
- Using short models to classify long texts☆21Updated 2 years ago
- Starbucks: Improved Training for 2D Matryoshka Embeddings☆18Updated last month
- GLADIS: A General and Large Acronym Disambiguation Benchmark (EACL 23)☆15Updated 9 months ago
- Multilingual Open Text☆25Updated 5 months ago
- ☆21Updated 3 years ago
- An easy-to-use library to linguistically compare one sentence and its words to another, in the same language or a different one. For inst…☆22Updated 3 years ago
- Repository with code for MaChAmp: https://aclanthology.org/2021.eacl-demos.22/☆84Updated 3 weeks ago
- CMU Linguistic Annotation Backend☆15Updated 11 months ago
- ☆26Updated last month
- A library for data streaming and augmentation☆20Updated last year
- 🧪 Cutting-edge experimental spaCy components and features☆98Updated 11 months ago
- Efficient Language Model Training through Cross-Lingual and Progressive Transfer Learning☆30Updated 2 years ago
- Code for the paper "Getting the most out of your tokenizer for pre-training and domain adaptation"☆16Updated last year
- spaCy entry points for Curated Transformers☆27Updated 6 months ago
- Augmenty is an augmentation library based on spaCy for augmenting texts.☆151Updated 10 months ago
- ☆21Updated 2 months ago
- ☆28Updated last year
- spaCy match and replace, maintaining conjugation☆35Updated 2 years ago
- A tiny BERT for low-resource monolingual models☆31Updated 6 months ago
- Sentence transformers models for SpaCy☆107Updated 2 years ago
- Pre-train Static Word Embeddings☆51Updated 3 weeks ago
- The CleanCoNLL dataset from our EMNLP 2023 paper where we corrected annotation errors and inconsistencies in CoNLL-03.☆23Updated 8 months ago