Jaykef / Triton-nanoGPTView external linksLinks
Custom triton kernels for training Karpathy's nanoGPT.
☆19Oct 21, 2024Updated last year
Alternatives and similar repositories for Triton-nanoGPT
Users that are interested in Triton-nanoGPT are comparing it to the libraries listed below
Sorting:
- ☆14Mar 2, 2025Updated 11 months ago
- ☆42Jan 24, 2026Updated 3 weeks ago
- "PyTorch in Rust"☆17Feb 13, 2024Updated 2 years ago
- ☆22May 5, 2025Updated 9 months ago
- Official implementation of Adaptive Feature Transfer (AFT)☆23Jun 12, 2024Updated last year
- NumPy+Jax with named axes and an uncompromising attitude☆23Mar 4, 2025Updated 11 months ago
- minimal Energy-based transformer☆43Dec 11, 2025Updated 2 months ago
- Simple MPI implementation for prototyping or learning☆300Aug 6, 2025Updated 6 months ago
- Codes accompanying the paper "LaProp: a Better Way to Combine Momentum with Adaptive Gradient"☆29Jul 30, 2020Updated 5 years ago
- in this repository, i'm going to implement increasingly complex llm inference optimizations☆83May 22, 2025Updated 8 months ago
- Awesome Triton Resources☆39Apr 27, 2025Updated 9 months ago
- ☆35Apr 12, 2024Updated last year
- ☆26Dec 3, 2025Updated 2 months ago
- This repository includes code to reproduce the tables in "Loss Landscapes are All You Need: Neural Network Generalization Can Be Explaine…☆40Mar 2, 2023Updated 2 years ago
- Implementation of 2-simplicial attention proposed by Clift et al. (2019) and the recent attempt to make practical in Fast and Simplex, Ro…☆46Sep 2, 2025Updated 5 months ago
- Minimal but scalable implementation of large language models in JAX☆35Nov 28, 2025Updated 2 months ago
- The Automated LLM Speedrunning Benchmark measures how well LLM agents can reproduce previous innovations and discover new ones in languag…☆128Oct 9, 2025Updated 4 months ago
- An efficient implementation of the NSA (Native Sparse Attention) kernel☆129Jun 24, 2025Updated 7 months ago
- A python algorithm to change the pitch of the voice in real time☆13Dec 13, 2020Updated 5 years ago
- ☆15Mar 18, 2025Updated 10 months ago
- [ICML 2024] Official Repository for the paper "Transformers Get Stable: An End-to-End Signal Propagation Theory for Language Models"☆10Jul 19, 2024Updated last year
- 详细双语注释版word2vec源码,well-annotated word2vec☆10Oct 3, 2021Updated 4 years ago
- Unofficial but Efficient Implementation of "Mamba: Linear-Time Sequence Modeling with Selective State Spaces" in JAX☆93Jan 25, 2024Updated 2 years ago
- ring-attention experiments☆165Oct 17, 2024Updated last year
- ☆44Feb 6, 2026Updated last week
- ☆131May 29, 2025Updated 8 months ago
- CVPR 2023: PAniC-3D, Vtubers dataset downloader☆13Apr 22, 2023Updated 2 years ago
- (READ ONLY MIRROR) The ProB Model Checker and Animator Plugin for Rodin☆19Jan 24, 2026Updated 3 weeks ago
- ☆16Jul 23, 2023Updated 2 years ago
- A distilled DeepSeek-R1 variant built on Qwen2.5-32B, fine-tuned with curated data for enhanced performance and efficiency. <metadata> gp…☆16Mar 11, 2025Updated 11 months ago
- ☆13Jul 18, 2022Updated 3 years ago
- A simple, hackable text-to-speech system in PyTorch and MLX☆186Aug 3, 2025Updated 6 months ago
- H-Net Dynamic Hierarchical Architecture☆81Sep 11, 2025Updated 5 months ago
- Code and data for experiments on semantic fragments☆11Jun 23, 2022Updated 3 years ago
- ArterialNet reconstructs arterial blood pressure (ABP) waveform☆13Feb 24, 2025Updated 11 months ago
- ☆10Jun 27, 2024Updated last year
- Light and dark variants for Visual Studio Code of the Base16 Grayscale theme by Chris Kempson☆10May 11, 2017Updated 8 years ago
- Proxify Molotov.tv DRM to share content publicly☆10Jun 24, 2020Updated 5 years ago
- ☆13Updated this week