A PyTorch implementation of the GPT-OSS-20B architecture. All components are coded from scratch: RoPE with YaRN, RMSNorm, SwiGLU with clamping and residual connection, Mixture-of-Experts (MoE), Self-Attention with learned sinks, banded attention, GQA, and KV-cache.
β226Dec 2, 2025Updated 4 months ago
Alternatives and similar repositories for gpt-oss-20B
Users that are interested in gpt-oss-20B are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- π Ambassador Paper for Innovative Use of NLP for Building Educational Applications 2023: Is ChatGPT a Good Teacher Coach? Measuring Zeroβ¦β14Jul 21, 2024Updated last year
- This repository hosts the code to port NumPy model weights of BiT-ResNets to TensorFlow SavedModel format.β14Dec 21, 2021Updated 4 years ago
- Minimal implementation of Denoised Smoothing (https://arxiv.org/abs/2003.01908) in TensorFlow.β20Aug 4, 2021Updated 4 years ago
- Neural Arithmetic Logic Units by Trask et al.β12Apr 10, 2019Updated 6 years ago
- How to build an ACP compliant agent that uses MCP as well!β11May 6, 2025Updated 11 months ago
- DigitalOcean Gradient AI Platform β’ AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- Code for the article series on building a Python compiler and interpreterβ11Feb 13, 2025Updated last year
- Kaggle-Bag of Words Meets Bags of Popcornβ23Jul 2, 2015Updated 10 years ago
- turn small javascript functions into GPT function callsβ12Aug 23, 2023Updated 2 years ago
- β17Feb 14, 2024Updated 2 years ago
- PyTorch implementation of Train Short, Test Long: Attention with Linear Biases Enables Input Length Extrapolationβ32Dec 29, 2021Updated 4 years ago
- An AI character interaction system with emotional modeling and advanced memory managementβ17Oct 26, 2024Updated last year
- Learn how to design large-scale systems. Prep for the system design interview. An update to the original system-design-primerβ33Jan 12, 2026Updated 2 months ago
- Mixture of Experts from scratchβ13Apr 12, 2024Updated last year
- Contains code to demonstrate distributed training in TensorFlow 2 with AI Platform and custom Docker contains.β20Apr 28, 2021Updated 4 years ago
- Managed hosting for WordPress and PHP on Cloudways β’ AdManaged hosting with the flexibility to host WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Cloudways by DigitalOcean.
- This project shows how to derive the total number of training tokens from a large text dataset from π€ datasets with Apache Beam and Dataβ¦β27Oct 20, 2022Updated 3 years ago
- GEMMβ10Aug 26, 2023Updated 2 years ago
- Linux kernel for SHIELDβ23Mar 12, 2015Updated 11 years ago
- An implementation of model parallel autoregressive transformers on GPUs, based on the DeepSpeed library.β13Jun 7, 2023Updated 2 years ago
- collab-dev - Collaboration Metrics for Code Reviewsβ23May 12, 2025Updated 10 months ago
- β22Mar 30, 2026Updated last week
- CUDA 8-bit Tensor Core Matrix Multiplication based on m16n16k16 WMMA APIβ35Sep 15, 2023Updated 2 years ago
- β11Oct 14, 2022Updated 3 years ago
- β11Sep 21, 2022Updated 3 years ago
- 1-Click AI Models by DigitalOcean Gradient β’ AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click and start building anything your business needs.
- β23Oct 30, 2019Updated 6 years ago
- Jax implementation of Proximal Policy Optimization (PPO) specifically tuned for Procgen, with benchmarked results and saved model weightsβ¦β59Aug 4, 2022Updated 3 years ago
- gpt from 0 -> 1β11Oct 9, 2025Updated 6 months ago
- β59Dec 12, 2025Updated 3 months ago
- [arXiv 2026] Official PyTorch Repository for "Coarse-Guided Visual Generation via Weighted h-Transform Sampling"β41Mar 16, 2026Updated 3 weeks ago
- A std::execution style runtime context and High Performance RPC Transport for using OpenUCX. Including CUDA/ROCM/... devices with RDMA.β30Mar 26, 2026Updated 2 weeks ago
- β19Mar 3, 2025Updated last year
- A simple, generic, and flexible keyframe animation library for Rust.β30Mar 27, 2026Updated last week
- πAutomatically Update circult-eda-mlsys-tinyml Papers Daily using Github Actions (Update Every 8th hours)β10Updated this week
- Virtual machines for every use case on DigitalOcean β’ AdGet dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
- mysql-3.23.49β11Jun 28, 2014Updated 11 years ago
- Custom ComfyUI node that combines VSR + VFI and allows streaming processing for arbitrary video length.β57Mar 28, 2026Updated last week
- My tests and experiments with some popular dl frameworks.β17Sep 11, 2025Updated 6 months ago
- Cute layout visualizationβ32Jan 18, 2026Updated 2 months ago
- GEMV implementation with CUTLASSβ19Aug 21, 2025Updated 7 months ago
- [NAACL 2025] A Closer Look into Mixture-of-Experts in Large Language Modelsβ61Feb 7, 2025Updated last year
- Clust_mgr is an important compnent of KunlunBase. It provides a HTTP API for KunlunBase users to do cluster management, provisioning and β¦β10Jun 13, 2023Updated 2 years ago