ZhuiyiTechnology / roformer
Rotary Transformer
☆811Updated 2 years ago
Related projects ⓘ
Alternatives and complementary repositories for roformer
- RoFormer V1 & V2 pytorch☆473Updated 2 years ago
- Implementation of the Transformer variant proposed in "Transformer Quality in Linear Time"☆347Updated last year
- ☆870Updated 5 months ago
- A plug-and-play library for parameter-efficient-tuning (Delta Tuning)☆996Updated last month
- Code for the ALiBi method for transformer language models (ICLR 2022)☆506Updated last year
- PyTorch Re-Implementation of "The Sparsely-Gated Mixture-of-Experts Layer" by Noam Shazeer et al. https://arxiv.org/abs/1701.06538☆974Updated 6 months ago
- A fast MoE impl for PyTorch☆1,560Updated 4 months ago
- SwissArmyTransformer is a flexible and powerful library to develop your own Transformer variants.☆991Updated last week
- A novel method to tune language models. Codes and datasets for paper ``GPT understands, too''.☆923Updated 2 years ago
- Implementation of paper "Towards a Unified View of Parameter-Efficient Transfer Learning" (ICLR 2022)☆517Updated 2 years ago
- Rectified Rotary Position Embeddings☆338Updated 5 months ago
- Tutel MoE: An Optimized Mixture-of-Experts Implementation☆728Updated last week
- Implementation of Rotary Embeddings, from the Roformer paper, in Pytorch☆565Updated last month
- A Pytorch implementation of Sparsely-Gated Mixture of Experts, for massively increasing the parameter count of language models☆637Updated last year
- 更纯粹、更高压缩率的Tokenizer☆447Updated 6 months ago
- Prefix-Tuning: Optimizing Continuous Prompts for Generation☆894Updated 6 months ago
- [NeurIPS'22 Spotlight] A Contrastive Framework for Neural Text Generation☆466Updated 8 months ago
- Diffusion-LM☆1,055Updated 3 months ago
- Must-read Papers of Parameter-Efficient Tuning (Delta Tuning) Methods on Pre-trained Models.☆274Updated last year
- Ongoing research training transformer language models at scale, including: BERT & GPT-2☆1,335Updated 7 months ago
- Implementation of "Attention Is Off By One" by Evan Miller☆182Updated last year
- [NIPS2023] RRHF & Wombat☆797Updated last year
- An implementation of local windowed attention for language modeling☆383Updated 2 months ago
- ACL'2023: DiffusionBERT: Improving Generative Masked Language Models with Diffusion Models☆292Updated 8 months ago
- [ICLR 2022] Official implementation of cosformer-attention in cosFormer: Rethinking Softmax in Attention☆179Updated last year
- Best practice for training LLaMA models in Megatron-LM☆627Updated 10 months ago
- Root Mean Square Layer Normalization☆212Updated last year
- Transformer based on a variant of attention that is linear complexity in respect to sequence length☆695Updated 6 months ago
- Official implementation of TransNormerLLM: A Faster and Better LLM☆229Updated 9 months ago
- 基于Gated Attention Unit的Transformer模型(尝鲜版)☆97Updated last year