nanowell / Differential-Transformer-PyTorchView on GitHub
PyTorch implementation of the Differential-Transformer architecture for sequence modeling, specifically tailored as a decoder-only model similar to large language models (LLMs). The architecture incorporates a novel Differential Attention mechanism, Multi-Head structure, RMSNorm, and SwiGLU.
86Oct 27, 2024Updated last year

Alternatives and similar repositories for Differential-Transformer-PyTorch

Users that are interested in Differential-Transformer-PyTorch are comparing it to the libraries listed below

Sorting:

Are these results useful?