helpmefindaname / transformer-smaller-training-vocab
Temporary remove unused tokens during training to save ram and speed.
☆22Updated 4 months ago
Related projects ⓘ
Alternatives and complementary repositories for transformer-smaller-training-vocab
- Learning BPE embeddings by first learning a segmentation model and then training word2vec☆19Updated last year
- BERT models for many languages created from Wikipedia texts☆34Updated 4 years ago
- SeqScore: Scoring for named entity recognition and other sequence labeling tasks☆20Updated last month
- Minimal code to train ELMo models in recent versions of TensorFlow☆14Updated last year
- GLADIS: A General and Large Acronym Disambiguation Benchmark (EACL 23)☆13Updated 4 months ago
- ☆73Updated 3 years ago
- Automatically detect errors in annotated corpora.☆47Updated last year
- A spaCy custom component that extracts and normalizes temporal expressions☆52Updated last year
- ☆27Updated 3 months ago
- Load What You Need: Smaller Multilingual Transformers for Pytorch and TensorFlow 2.0.☆101Updated 2 years ago
- Repository with code for MaChAmp: https://aclanthology.org/2021.eacl-demos.22/☆82Updated last month
- An extension package of 🤗 Datasets that provides support for executing arbitrary SQL queries on HF datasets☆31Updated 9 months ago
- Generate BERT vocabularies and pretraining examples from Wikipedias☆18Updated 4 years ago
- A simple neural truecaser written in pytorch and allennlp.☆32Updated 5 months ago
- GC4LM: A Colossal (Biased) language model for German☆13Updated 3 years ago
- This repository contains the code for the paper 'PARM: Paragraph Aggregation Retrieval Model for Dense Document-to-Document Retrieval' pu…☆40Updated 2 years ago
- Tool for parsing and converting various span encoding schemes.☆22Updated 10 months ago
- Examples for aligning, padding and batching sequence labeling data (NER) for use with pre-trained transformer models☆65Updated last year
- ☆21Updated 3 years ago
- Efficient Sentence Embedding via Semantic Subspace Analysis☆14Updated 4 years ago
- KIND: an Italian Multi-Domain Dataset for Named Entity Recognition☆15Updated last year
- Statistics on multilingual datasets☆17Updated 2 years ago
- Implementation of Nested Named Entity Recognition using Flair☆24Updated 3 years ago
- PyTorch-IE: State-of-the-art Information Extraction in PyTorch☆76Updated last week
- A embed able annotation tool for end to end cross document co-reference☆41Updated last year
- Combining encoder-based language models☆11Updated 3 years ago
- ☆12Updated 2 years ago
- zero-vocab or low-vocab embeddings☆17Updated 2 years ago
- Shared code for training sentence embeddings with Flax / JAX☆27Updated 3 years ago
- Augmenty is an augmentation library based on spaCy for augmenting texts.☆151Updated 5 months ago