john-hewitt / backpacks-flash-attn
The original Backpack Language Model implementation, a fork of FlashAttention
☆67Updated last year
Alternatives and similar repositories for backpacks-flash-attn:
Users that are interested in backpacks-flash-attn are comparing it to the libraries listed below
- This is the oficial repository for "Parameter-Efficient Multi-task Tuning via Attentional Mixtures of Soft Prompts" (EMNLP 2022)☆100Updated 2 years ago
- ☆24Updated 2 years ago
- DEMix Layers for Modular Language Modeling☆53Updated 3 years ago
- ☆49Updated last year
- [ACL'24 Oral] Analysing The Impact of Sequence Composition on Language Model Pre-Training☆20Updated 8 months ago
- DiffusER: Discrete Diffusion via Edit-based Reconstruction (Reid, Hellendoorn & Neubig, 2022)☆54Updated 2 years ago
- [NeurIPS 2023] Repetition In Repetition Out: Towards Understanding Neural Text Degeneration from the Data Perspective☆30Updated last year
- ☆128Updated 2 years ago
- TBC☆26Updated 2 years ago
- [ICML 2023] Exploring the Benefits of Training Expert Language Models over Instruction Tuning☆98Updated last year
- Simple Parameter-efficient Fine-tuning for Transformer-based Masked Language-models☆140Updated 2 years ago
- ☆21Updated 2 years ago
- [ICML 2023] Code for our paper “Compositional Exemplars for In-context Learning”.☆99Updated 2 years ago
- No Parameters Left Behind: Sensitivity Guided Adaptive Learning Rate for Training Large Transformer Models (ICLR 2022)☆30Updated 3 years ago
- contrastive decoding☆199Updated 2 years ago
- This is the implementation of the paper AdaMix: Mixture-of-Adaptations for Parameter-efficient Model Tuning (https://arxiv.org/abs/2205.1…☆130Updated last year
- Retrieval as Attention☆83Updated 2 years ago
- NAACL 2022: MCSE: Multimodal Contrastive Learning of Sentence Embeddings☆55Updated 10 months ago
- PyTorch reimplementation of REALM and ORQA☆22Updated 3 years ago
- ☆85Updated 2 years ago
- [NeurIPS 2022] Generating Training Data with Language Models: Towards Zero-Shot Language Understanding☆64Updated 2 years ago
- ☆34Updated 2 years ago
- Official PyTorch Implementation of EMoE: Unlocking Emergent Modularity in Large Language Models [main conference @ NAACL2024]☆29Updated 10 months ago
- Code base of In-Context Learning for Dialogue State tracking☆45Updated last year
- ☆19Updated 2 years ago
- One Network, Many Masks: Towards More Parameter-Efficient Transfer Learning☆39Updated last year
- Repository for "Propagating Knowledge Updates to LMs Through Distillation" (NeurIPS 2023).☆25Updated 8 months ago
- The official code of EMNLP 2022, "SCROLLS: Standardized CompaRison Over Long Language Sequences".☆69Updated last year
- Code for M4LE: A Multi-Ability Multi-Range Multi-Task Multi-Domain Long-Context Evaluation Benchmark for Large Language Models☆23Updated 8 months ago
- Repo for the paper "Large Language Models Struggle to Learn Long-Tail Knowledge"☆76Updated 2 years ago