asahi417 / lm-vocab-trimmer
Vocabulary Trimming (VT) is a model compression technique, which reduces a multilingual LM vocabulary to a target language by deleting irrelevant tokens from its vocabulary. This repository contains a python-library vocabtrimmer, that remove irrelevant tokens from a multilingual LM vocabulary for the target language.
☆31Updated 3 weeks ago
Related projects ⓘ
Alternatives and complementary repositories for lm-vocab-trimmer
- IntructIR, a novel benchmark specifically designed to evaluate the instruction following ability in information retrieval models. Our foc…☆28Updated 5 months ago
- ☆55Updated last year
- SWIM-IR is a Synthetic Wikipedia-based Multilingual Information Retrieval training set with 28 million query-passage pairs spanning 33 la…☆44Updated last year
- KETOD Knowledge-Enriched Task-Oriented Dialogue☆31Updated last year
- BLOOM+1: Adapting BLOOM model to support a new unseen language☆70Updated 8 months ago
- Pytorch Implementation of EncT5: Fine-tuning T5 Encoder for Non-autoregressive Tasks☆63Updated 2 years ago
- ☆19Updated last year
- ☆33Updated last year
- [ICLR 2023] PyTorch code of Summarization Programs: Interpretable Abstractive Summarization with Neural Modular Trees☆23Updated last year
- Official repository for our EACL 2023 paper "LongEval: Guidelines for Human Evaluation of Faithfulness in Long-form Summarization" (https…☆43Updated 3 months ago
- ☆28Updated 2 years ago
- ☆15Updated 3 months ago
- The Multitask Long Document Benchmark☆38Updated 2 years ago
- Detect hallucinated tokens for conditional sequence generation.☆63Updated 2 years ago
- Prompting Large Language Models to Generate Dense and Sparse Representations for Zero-Shot Document Retrieval☆38Updated last month
- ☆17Updated 10 months ago
- ACL22 paper: Imputing Out-of-Vocabulary Embeddings with LOVE Makes Language Models Robust with Little Cost☆39Updated last year
- PyTorch code for "FactPEGASUS: Factuality-Aware Pre-training and Fine-tuning for Abstractive Summarization" (NAACL 2022)☆38Updated 2 years ago
- Reference implementation for Reward-Augmented Decoding: Efficient Controlled Text Generation With a Unidirectional Reward Model☆41Updated 10 months ago
- An Empirical Study On Contrastive Search And Contrastive Decoding For Open-ended Text Generation☆26Updated 5 months ago
- PyTorch reimplementation of REALM and ORQA☆22Updated 2 years ago
- ☆29Updated last year
- Official implementation of the paper "IteraTeR: Understanding Iterative Revision from Human-Written Text" (ACL 2022)☆76Updated last year
- This repository contains the dataset and code for "WiCE: Real-World Entailment for Claims in Wikipedia" in EMNLP 2023.☆40Updated 11 months ago
- ☆14Updated last month
- Research code for the paper "How Good is Your Tokenizer? On the Monolingual Performance of Multilingual Language Models"☆26Updated 3 years ago
- Emotion-Aware Dialogue Response Generation by Multi-Task Learning☆13Updated 2 years ago
- The official repository for Efficient Long-Text Understanding Using Short-Text Models (Ivgi et al., 2022) paper☆67Updated last year
- ☆37Updated last year
- Efficient Language Model Training through Cross-Lingual and Progressive Transfer Learning☆29Updated last year