asahi417 / lm-vocab-trimmer
Vocabulary Trimming (VT) is a model compression technique, which reduces a multilingual LM vocabulary to a target language by deleting irrelevant tokens from its vocabulary. This repository contains a python-library vocabtrimmer, that remove irrelevant tokens from a multilingual LM vocabulary for the target language.
☆33Updated 3 months ago
Alternatives and similar repositories for lm-vocab-trimmer:
Users that are interested in lm-vocab-trimmer are comparing it to the libraries listed below
- ☆34Updated last year
- ACL22 paper: Imputing Out-of-Vocabulary Embeddings with LOVE Makes Language Models Robust with Little Cost☆40Updated last year
- Pytorch Implementation of EncT5: Fine-tuning T5 Encoder for Non-autoregressive Tasks☆63Updated 3 years ago
- This repository contains the code for paper Prompting ELECTRA Few-Shot Learning with Discriminative Pre-Trained Models.☆46Updated 2 years ago
- IntructIR, a novel benchmark specifically designed to evaluate the instruction following ability in information retrieval models. Our foc…☆31Updated 7 months ago
- A extension of Transformers library to include T5ForSequenceClassification class.☆37Updated last year
- BLOOM+1: Adapting BLOOM model to support a new unseen language☆70Updated 10 months ago
- KETOD Knowledge-Enriched Task-Oriented Dialogue☆32Updated 2 years ago
- SWIM-IR is a Synthetic Wikipedia-based Multilingual Information Retrieval training set with 28 million query-passage pairs spanning 33 la…☆45Updated last year
- PyTorch code for "FactPEGASUS: Factuality-Aware Pre-training and Fine-tuning for Abstractive Summarization" (NAACL 2022)☆38Updated 2 years ago
- codes and pre-trained models of paper "Segatron: Segment-aware Transformer for Language Modeling and Understanding"☆18Updated 2 years ago
- ☆55Updated 2 years ago
- TBC☆26Updated 2 years ago
- Efficient Language Model Training through Cross-Lingual and Progressive Transfer Learning☆29Updated 2 years ago
- Research code for the paper "How Good is Your Tokenizer? On the Monolingual Performance of Multilingual Language Models"☆26Updated 3 years ago
- The Multitask Long Document Benchmark☆38Updated 2 years ago
- Official repository for our EACL 2023 paper "LongEval: Guidelines for Human Evaluation of Faithfulness in Long-form Summarization" (https…☆43Updated 5 months ago
- [ICLR 2023] PyTorch code of Summarization Programs: Interpretable Abstractive Summarization with Neural Modular Trees☆23Updated last year
- Detect hallucinated tokens for conditional sequence generation.☆64Updated 2 years ago
- ☆28Updated 2 years ago
- No Parameter Left Behind: How Distillation and Model Size Affect Zero-Shot Retrieval☆28Updated 2 years ago
- M2D2: A Massively Multi-domain Language Modeling Dataset (EMNLP 2022) by Machel Reid, Victor Zhong, Suchin Gururangan, Luke Zettlemoyer☆55Updated 2 years ago
- Code for ACL 2022 paper "Expanding Pretrained Models to Thousands More Languages via Lexicon-based Adaptation"☆30Updated 2 years ago
- Repo for "On Learning to Summarize with Large Language Models as References"☆44Updated last year
- Source code for paper "Learning from Noisy Labels for Entity-Centric Information Extraction", EMNLP 2021☆55Updated 3 years ago
- The official repository for Efficient Long-Text Understanding Using Short-Text Models (Ivgi et al., 2022) paper☆68Updated last year
- First explanation metric (diagnostic report) for text generation evaluation☆63Updated 6 months ago
- Prompting Large Language Models to Generate Dense and Sparse Representations for Zero-Shot Document Retrieval☆40Updated 3 months ago
- The dataset and code for ACL 2022 paper "SciNLI: A Corpus for Natural Language Inference on Scientific Text" are released here.☆27Updated last year
- Can LLMs generate code-mixed sentences through zero-shot prompting?☆11Updated last year