HamzaElshafie / gpt-oss-20B
View external linksLinks

A PyTorch implementation of the GPT-OSS-20B architecture. All components are coded from scratch: RoPE with YaRN, RMSNorm, SwiGLU with clamping and residual connection, Mixture-of-Experts (MoE), Self-Attention with learned sinks, banded attention, GQA, and KV-cache.
209Dec 2, 2025Updated 2 months ago

Alternatives and similar repositories for gpt-oss-20B

Users that are interested in gpt-oss-20B are comparing it to the libraries listed below

Sorting:

Are these results useful?