nanowell / Differential-Transformer-PyTorchLinks

PyTorch implementation of the Differential-Transformer architecture for sequence modeling, specifically tailored as a decoder-only model similar to large language models (LLMs). The architecture incorporates a novel Differential Attention mechanism, Multi-Head structure, RMSNorm, and SwiGLU.
66Updated 7 months ago

Alternatives and similar repositories for Differential-Transformer-PyTorch

Users that are interested in Differential-Transformer-PyTorch are comparing it to the libraries listed below

Sorting: