HamzaElshafie/gpt-oss-20B

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/HamzaElshafie/gpt-oss-20B)

HamzaElshafie / gpt-oss-20B

A PyTorch implementation of the GPT-OSS-20B architecture. All components are coded from scratch: RoPE with YaRN, RMSNorm, SwiGLU with clamping and residual connection, Mixture-of-Experts (MoE), Self-Attention with learned sinks, banded attention, GQA, and KV-cache.

☆238

Alternatives and similar repositories for gpt-oss-20B

Users that are interested in gpt-oss-20B are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

HamzaElshafie / h100_gemm
View on GitHub
A series of high-performance GEMM (General Matrix Multiply) implementations Iteratively optimised for H100 GPUs in Pure CUDA.
☆79Feb 18, 2026Updated 5 months ago
Mog9 / gpt2-inference
View on GitHub
A GPT-2 inference engine written from scratch in CUDA and C++. Implements custom CUDA kernels for tiled matrix multiplication, LayerNorm,…
☆42May 17, 2026Updated 2 months ago
dataflowr / gpu_llm_flash-attention
View on GitHub
Course on Flash-attention in Triton
☆100Feb 9, 2026Updated 5 months ago
ZihanWang314 / coeCheck
View on GitHub
☆19Mar 3, 2025Updated last year
Infatoshi / physics-llm-inference
View on GitHub
Companion code for The Physics of LLM Inference book
☆26Apr 21, 2026Updated 3 months ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
aerlabsAI / nano-vllm
View on GitHub
☆15Mar 11, 2026Updated 4 months ago
danielvegamyhre / gemm
View on GitHub
☆19Mar 29, 2026Updated 3 months ago
Infatoshi / MegaQwen
View on GitHub
Qwen3-0.6B megakernel: 527 tok/s decode on RTX 3090 (3.8x faster than PyTorch)
☆117Feb 10, 2026Updated 5 months ago
modal-projects / modal-jazz
View on GitHub
we have ai at home
☆117Jun 18, 2026Updated last month
luongthecong123 / fp8-quant-matmul
View on GitHub
Row-wise block scaling for fp8 quantization matrix multiplication. Solution to GPU mode AMD challenge.
☆19Feb 9, 2026Updated 5 months ago
aryagxr / cuda
View on GitHub
coding CUDA everyday!
☆77Feb 5, 2026Updated 5 months ago
RightNow-AI / qwen3.5-triton
View on GitHub
Pure Triton kernels for Qwen3.5-27B inference on NVIDIA B200
☆121Feb 28, 2026Updated 4 months ago
j4orz / ateenysitp
View on GitHub
a whirlwind tour to deep learning and deep learning systems
☆81Updated this week
anakin87 / llm-rl-environments-lil-course
View on GitHub
🌱 A little course on Reinforcement Learning Environments for evaluating and training Language Models
☆216May 27, 2026Updated last month
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
JINO-ROHIT / ml-systems-notes
View on GitHub
a personal collection of my notes for ml sys
☆107Updated this week
JINO-ROHIT / tachyon
View on GitHub
a LLM inference engine to run on consumer hardware
☆46Apr 15, 2026Updated 3 months ago
JINO-ROHIT / advanced_ml
View on GitHub
☆132Dec 9, 2025Updated 7 months ago
AndreSlavescu / mHC.cu
View on GitHub
mHC kernels implemented in CUDA
☆264Mar 9, 2026Updated 4 months ago
Dogacel / auto-gpu-kernel
View on GitHub
Winner 🏆 (Agent-only) MLSys 2026 - FlashInfer AI Kernel Generation Contest for the DeepSeek Sparse Attention (DSA) track with an average…
☆148Jun 10, 2026Updated last month
kabir2505 / tiny-mixtral
View on GitHub
☆44May 4, 2025Updated last year
pavanjava / qql
View on GitHub
SQL-like query language and CLI for Qdrant vector search engine
☆46Jun 13, 2026Updated last month
SzymonOzog / Penny
View on GitHub
Hand-Rolled GPU communications library
☆96Nov 25, 2025Updated 7 months ago
datavorous / inference-engineering
View on GitHub
documenting my work in inference engineering
☆25Apr 19, 2026Updated 3 months ago
End-to-end encrypted email - Proton Mail • Ad
Special offer: 40% Off Yearly / 80% Off First Month. All Proton services are open source and independently audited for security.
YuvrajSingh-mist / smolcluster
View on GitHub
An educational distributed training and inference library for neural nets using local computing
☆72Jun 10, 2026Updated last month
sgl-project / mini-sglang
View on GitHub
A compact implementation of SGLang, designed to demystify the complexities of modern LLM serving systems.
☆4,616May 17, 2026Updated 2 months ago
mohammed840 / RLM-implementation
View on GitHub
RVAA: Recursive Vision-Action Agent for Long Video Understanding. Implementation of the RLM paradigm (Zhang, Kraska, Khattab 2025)
☆119Jan 15, 2026Updated 6 months ago
idoatad / TensorLens
View on GitHub
Official PyTorch implementation for "TensorLens: End-to-End Transformer Analysis via High-Order Attention Tensors" [ACL 2026]
☆47Apr 14, 2026Updated 3 months ago
ovshake / nano-vllm
View on GitHub
a fun and educational take on vLLM
☆212Jan 25, 2026Updated 5 months ago
kabir2505 / Deep-Learning-History
View on GitHub
Deep learning paper implementations
☆19May 15, 2025Updated last year
ChinmayK0607 / heiretsu
View on GitHub
Educational WIP
☆73Feb 16, 2026Updated 5 months ago
wolfecameron / nanoMoE
View on GitHub
An extension of the nanoGPT repository for training small MOE models.
☆279Mar 9, 2025Updated last year
olivkoch / TinyRecursiveModels
View on GitHub
☆35Nov 11, 2025Updated 8 months ago
AI Agents on DigitalOcean Gradient AI Platform • Ad
Build production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
mesutoezdil / Systematic-CUDA-Learning
View on GitHub
Personal CUDA learning repo, built step by step from scratch.
☆95Jun 14, 2026Updated last month
huggingface / nanowhale
View on GitHub
☆380May 4, 2026Updated 2 months ago
SJTU-IPADS / MetaAttention
View on GitHub
MetaAttention: A Unified and Performant Attention Framework Across Hardware Backends(PPoPP'26)
☆16Dec 31, 2025Updated 6 months ago
YuvrajSingh-mist / Paper-Replications
View on GitHub
A repository consisting of paper/architecture replications of classic/SOTA AI/ML papers in pytorch
☆425Nov 11, 2025Updated 8 months ago
jammastergirish / BuildAnLLM
View on GitHub
☆174May 29, 2026Updated last month
Snektron / gpumode-amd-fp8-mm
View on GitHub
My submission for the GPUMODE/AMD fp8 mm challenge
☆29Jun 4, 2025Updated last year
LeiWang1999 / TVM.CMakeExtend
View on GitHub
Tutorials of Extending and importing TVM with CMAKE Include dependency.
☆16Oct 11, 2024Updated last year