A PyTorch implementation of the GPT-OSS-20B architecture. All components are coded from scratch: RoPE with YaRN, RMSNorm, SwiGLU with clamping and residual connection, Mixture-of-Experts (MoE), Self-Attention with learned sinks, banded attention, GQA, and KV-cache.
☆219Dec 2, 2025Updated 3 months ago
Alternatives and similar repositories for gpt-oss-20B
Users that are interested in gpt-oss-20B are comparing it to the libraries listed below
Sorting:
- Research on training an LLM with DeepSeek & Kimi architecture☆41Sep 30, 2025Updated 5 months ago
- This repository hosts the code to port NumPy model weights of BiT-ResNets to TensorFlow SavedModel format.☆14Dec 21, 2021Updated 4 years ago
- Neural Arithmetic Logic Units by Trask et al.☆12Apr 10, 2019Updated 6 years ago
- This repository hosts code for converting the original MLP Mixer models (JAX) to TensorFlow.☆15Sep 29, 2021Updated 4 years ago
- Minimal implementation of Denoised Smoothing (https://arxiv.org/abs/2003.01908) in TensorFlow.☆20Aug 4, 2021Updated 4 years ago
- ☆16Jun 20, 2023Updated 2 years ago
- Showcases the use of deep learning to detect wheat heads from crops. The project is based on: https://www.kaggle.com/c/global-wheat-detec…☆18May 30, 2020Updated 5 years ago
- This repository holds files and scripts for incorporating simple CI/CD practices for model training in ML.☆21Oct 26, 2021Updated 4 years ago
- Contains code to demonstrate distributed training in TensorFlow 2 with AI Platform and custom Docker contains.☆20Apr 28, 2021Updated 4 years ago
- ☆23Oct 30, 2019Updated 6 years ago
- This project shows how to derive the total number of training tokens from a large text dataset from 🤗 datasets with Apache Beam and Data…☆27Oct 20, 2022Updated 3 years ago
- CUDA 8-bit Tensor Core Matrix Multiplication based on m16n16k16 WMMA API☆34Sep 15, 2023Updated 2 years ago
- [CVPR 2026] Official code of "EmbodiedSplat: Online Feed-Forward Semantic 3DGS for Open-Vocabulary 3D Scene Understanding"☆38Updated this week
- PyTorch implementation of Train Short, Test Long: Attention with Linear Biases Enables Input Length Extrapolation☆33Dec 29, 2021Updated 4 years ago
- Project demonstrating dual model deployment scenarios using Vertex AI (GCP).☆34Dec 28, 2021Updated 4 years ago
- All Resources from Stanford CS106B 2021☆24Jul 11, 2025Updated 8 months ago
- ☆14Mar 2, 2026Updated last week
- MiniMax-Provider-Verifier offers a rigorous, vendor-agnostic way to verify whether third-party deployments of the Minimax M2 model are co…☆30Feb 18, 2026Updated 3 weeks ago
- Official codebase for the paper "Reasoning Within the Mind: Dynamic Multimodal Interleaving in Latent Space"☆65Dec 17, 2025Updated 2 months ago
- OLMost every training recipe you need to perform data interventions with the OLMo family of models.☆66Mar 7, 2026Updated last week
- Experiments with the ideas presented in https://arxiv.org/abs/2003.00152 by Frankle et al.☆29Aug 21, 2020Updated 5 years ago
- This repository is a reimplementation of the paper(BERT has a Mouth, and It Must Speak: BERT as a Markov Random Field Language Model: htt…☆11Nov 14, 2019Updated 6 years ago
- Mixture of Experts from scratch☆13Apr 12, 2024Updated last year
- An Awesome list of AI tools powered by ChatGPT / Whisper and Stable DIffusion or are useful to developers of that domain☆10Jul 26, 2023Updated 2 years ago
- ☆10Apr 7, 2025Updated 11 months ago
- This repository shows how to implement a basic model for multimodal entailment.☆10Aug 17, 2021Updated 4 years ago
- Code for the experiments in the ACL 2020 paper "Estimating predictive uncertainty for rumour verification models"☆11May 15, 2020Updated 5 years ago
- Official implementation of FRAPPE: Infusing World Modeling into Generalist Policies via Multiple Future Representation Alignment☆30Feb 24, 2026Updated 2 weeks ago
- web audio player☆21Mar 3, 2011Updated 15 years ago
- ☆11Nov 30, 2023Updated 2 years ago
- decontamination☆26Mar 4, 2026Updated last week
- DiFSD: Ego-Centric Fully Sparse Paradigm for End-to-End Self-Driving☆14Mar 9, 2025Updated last year
- Pytorch Implementation of RetinaNet with CUDA accelerate nms operation.☆10Jul 8, 2019Updated 6 years ago
- Plug-and-Play Benchmarking of Reinforcement Learning Algorithms for Large-Scale Flow Control☆37Feb 22, 2026Updated 2 weeks ago
- Data and code for analyzing Movie Lead Gender.☆10Apr 13, 2016Updated 9 years ago
- ☆10Jul 27, 2020Updated 5 years ago
- Minimal implementation of PAWS (https://arxiv.org/abs/2104.13963) in TensorFlow.☆45May 25, 2021Updated 4 years ago
- Fuzzing solmate with medusa☆10Aug 14, 2023Updated 2 years ago
- solutions for advent of code 2018☆17Dec 19, 2018Updated 7 years ago