ZhuiyiTechnology / roformer
Rotary Transformer
☆939Updated 3 years ago
Alternatives and similar repositories for roformer:
Users that are interested in roformer are comparing it to the libraries listed below
- RoFormer V1 & V2 pytorch☆495Updated 2 years ago
- Implementation of the Transformer variant proposed in "Transformer Quality in Linear Time"☆362Updated last year
- SwissArmyTransformer is a flexible and powerful library to develop your own Transformer variants.☆1,075Updated 4 months ago
- A fast MoE impl for PyTorch☆1,711Updated 2 months ago
- Code for the ALiBi method for transformer language models (ICLR 2022)☆524Updated last year
- ☆876Updated 11 months ago
- Implementation of Rotary Embeddings, from the Roformer paper, in Pytorch☆666Updated 5 months ago
- Implementation of paper "Towards a Unified View of Parameter-Efficient Transfer Learning" (ICLR 2022)☆526Updated 3 years ago
- A novel method to tune language models. Codes and datasets for paper ``GPT understands, too''.☆930Updated 2 years ago
- A plug-and-play library for parameter-efficient-tuning (Delta Tuning)☆1,026Updated 7 months ago
- Diffusion-LM☆1,125Updated 8 months ago
- Rectified Rotary Position Embeddings☆366Updated 11 months ago
- 更纯粹、更高压缩率的Tokenizer☆475Updated 5 months ago
- Tutel MoE: Optimized Mixture-of-Experts Library, Support DeepSeek FP8/FP4☆809Updated this week
- Transformer based on a variant of attention that is linear complexity in respect to sequence length☆758Updated 11 months ago
- PyTorch Re-Implementation of "The Sparsely-Gated Mixture-of-Experts Layer" by Noam Shazeer et al. https://arxiv.org/abs/1701.06538☆1,099Updated last year
- A Pytorch implementation of Sparsely-Gated Mixture of Experts, for massively increasing the parameter count of language models☆730Updated last year
- Pytorch library for fast transformer implementations☆1,697Updated 2 years ago
- [ICLR'23] DiffuSeq: Sequence to Sequence Text Generation with Diffusion Models☆769Updated last year
- Reformer, the efficient Transformer, in Pytorch☆2,163Updated last year
- Longformer: The Long-Document Transformer☆2,112Updated 2 years ago
- Long Range Arena for Benchmarking Efficient Transformers☆751Updated last year
- Ongoing research training transformer language models at scale, including: BERT & GPT-2☆1,386Updated last year
- real Transformer TeraFLOPS on various GPUs☆899Updated last year
- Best practice for training LLaMA models in Megatron-LM☆649Updated last year
- Implementation of MEGABYTE, Predicting Million-byte Sequences with Multiscale Transformers, in Pytorch☆641Updated 4 months ago
- 基于Gated Attention Unit的Transformer模型(尝鲜版)☆97Updated 2 years ago
- [ICLR 2022] Official implementation of cosformer-attention in cosFormer: Rethinking Softmax in Attention☆190Updated 2 years ago
- ACL'2023: DiffusionBERT: Improving Generative Masked Language Models with Diffusion Models☆307Updated last year
- Prefix-Tuning: Optimizing Continuous Prompts for Generation☆919Updated last year