helpmefindaname / transformer-smaller-training-vocab
Temporary remove unused tokens during training to save ram and speed.
☆22Updated 8 months ago
Alternatives and similar repositories for transformer-smaller-training-vocab:
Users that are interested in transformer-smaller-training-vocab are comparing it to the libraries listed below
- Minimal code to train ELMo models in recent versions of TensorFlow☆14Updated last year
- Learning BPE embeddings by first learning a segmentation model and then training word2vec☆19Updated 2 years ago
- Tool for parsing and converting various span encoding schemes.☆23Updated last year
- PyTorch-IE: State-of-the-art Information Extraction in PyTorch☆77Updated 3 weeks ago
- A lightweight but powerful library to build token indices for NLP tasks, compatible with major Deep Learning frameworks like PyTorch and …☆51Updated 3 months ago
- zero-vocab or low-vocab embeddings☆18Updated 2 years ago
- Tokenization across languages. Useful as preprocessing for subword tokenization.☆22Updated 2 years ago
- ☆17Updated last year
- GLADIS: A General and Large Acronym Disambiguation Benchmark (EACL 23)☆15Updated 9 months ago
- ☆87Updated 2 years ago
- A spaCy custom component that extracts and normalizes temporal expressions☆54Updated 2 years ago
- Repository with code for MaChAmp: https://aclanthology.org/2021.eacl-demos.22/☆84Updated 3 weeks ago
- Examples for aligning, padding and batching sequence labeling data (NER) for use with pre-trained transformer models☆65Updated 2 years ago
- The CleanCoNLL dataset from our EMNLP 2023 paper where we corrected annotation errors and inconsistencies in CoNLL-03.☆23Updated 8 months ago
- Generate BERT vocabularies and pretraining examples from Wikipedias☆18Updated 4 years ago
- ☆43Updated last year
- ☆21Updated 3 years ago
- Combining encoder-based language models☆11Updated 3 years ago
- ☆13Updated 3 years ago
- GrammarTagger — A Neural Multilingual Grammar Profiler for Language Learning☆27Updated 3 years ago
- ☆26Updated last month
- A simple neural truecaser written in pytorch and allennlp.☆33Updated 9 months ago
- ☆75Updated 3 years ago
- Semantically Structured Sentence Embeddings☆65Updated 5 months ago
- ☆64Updated 2 years ago
- KIND: an Italian Multi-Domain Dataset for Named Entity Recognition☆15Updated last year
- ☆24Updated 5 years ago
- Automatically detect errors in annotated corpora.☆47Updated last year
- A Python library aimed at dissecting and augmenting NER training data.☆58Updated last year
- A set of methods for finding an appropriate number of topics in a text collection☆15Updated last week