Implementation of a Transformer, but completely in Triton
☆279Apr 5, 2022Updated 4 years ago
Alternatives and similar repositories for triton-transformer
Users that are interested in triton-transformer are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- GPT, but made only out of MLPs☆89May 25, 2021Updated 4 years ago
- Implementation of Token Shift GPT - An autoregressive model that solely relies on shifting the sequence space for mixing☆49Jan 27, 2022Updated 4 years ago
- Implementation of a Transformer that Ponders, using the scheme from the PonderNet paper☆81Oct 30, 2021Updated 4 years ago
- Implementation of Multistream Transformers in Pytorch☆54Jul 31, 2021Updated 4 years ago
- Implementation of fused cosine similarity attention in the same style as Flash Attention☆220Feb 13, 2023Updated 3 years ago
- NordVPN Special Discount Offer • AdSave on top-rated NordVPN 1 or 2-year plans with secure browsing, privacy protection, and support for for all major platforms.
- Implementation of the Remixer Block from the Remixer paper, in Pytorch☆36Sep 27, 2021Updated 4 years ago
- Implementation of Hourglass Transformer, in Pytorch, from Google and OpenAI☆98Dec 31, 2021Updated 4 years ago
- Kernl lets you run PyTorch transformer models several times faster on GPU with a single line of code, and is designed to be easily hackab…☆1,585Jan 28, 2026Updated 2 months ago
- Another attempt at a long-context / efficient transformer by me☆38Apr 11, 2022Updated 3 years ago
- Contrastive Language-Image Pretraining☆145Sep 6, 2022Updated 3 years ago
- GPTQ inference Triton kernel☆321May 18, 2023Updated 2 years ago
- A subset of PyTorch's neural network modules, written in Python using OpenAI's Triton.☆598Aug 12, 2025Updated 7 months ago
- Implementation of TableFormer, Robust Transformer Modeling for Table-Text Encoding, in Pytorch☆39Mar 29, 2022Updated 4 years ago
- A python library for highly configurable transformers - easing model architecture search and experimentation.☆49Nov 30, 2021Updated 4 years ago
- DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- Implementation of Fast Transformer in Pytorch☆176Aug 26, 2021Updated 4 years ago
- An open source implementation of CLIP.☆33Nov 7, 2022Updated 3 years ago
- Implementation of Flash Attention in Jax☆227Mar 1, 2024Updated 2 years ago
- Implementation of Long-Short Transformer, combining local and global inductive biases for attention over long sequences, in Pytorch☆120Aug 4, 2021Updated 4 years ago
- Cataloging released Triton kernels.☆299Sep 9, 2025Updated 7 months ago
- ☆30Oct 3, 2022Updated 3 years ago
- Implementation of a Transformer using ReLA (Rectified Linear Attention) from https://arxiv.org/abs/2104.07012☆49Apr 6, 2022Updated 4 years ago
- Transformers components but in Triton☆34May 9, 2025Updated 10 months ago
- jax-triton contains integrations between JAX and OpenAI Triton☆442Mar 26, 2026Updated 2 weeks ago
- DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- Source-to-Source Debuggable Derivatives in Pure Python☆15Jan 23, 2024Updated 2 years ago
- ☆51Jan 28, 2024Updated 2 years ago
- Development repository for the Triton language and compiler☆18,840Updated this week
- Type annotations and dynamic checking for a tensor's shape, dtype, names, etc.☆1,476May 2, 2025Updated 11 months ago
- Implementation of the specific Transformer architecture from PaLM - Scaling Language Modeling with Pathways - in Jax (Equinox framework)☆190Jun 24, 2022Updated 3 years ago
- A concise but complete implementation of CLIP with various experimental improvements from recent papers☆722Oct 16, 2023Updated 2 years ago
- A GPT, made only of MLPs, in Jax☆59Jun 23, 2021Updated 4 years ago
- ☆19Dec 4, 2025Updated 4 months ago
- ☆109Mar 12, 2026Updated 3 weeks ago
- Proton VPN Special Offer - Get 70% off • AdSpecial partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
- A collection of memory efficient attention operators implemented in the Triton language.☆289Jun 5, 2024Updated last year
- My attempts at applying Soundstream design on learned tokenization of text and then applying hierarchical attention to text generation☆90Oct 11, 2024Updated last year
- Source code for "N-ary Constituent Tree Parsing with Recursive Semi-Markov Model" published at ACL 2021☆10May 27, 2021Updated 4 years ago
- Implementation of Mega, the Single-head Attention with Multi-headed EMA architecture that currently holds SOTA on Long Range Arena☆207Aug 26, 2023Updated 2 years ago
- ☆21Mar 15, 2023Updated 3 years ago
- Experiment of using Tangent to autodiff triton☆82Jan 22, 2024Updated 2 years ago
- [NeurIPS 2023 spotlight] Official implementation of HGRN in our NeurIPS 2023 paper - Hierarchically Gated Recurrent Neural Network for Se…☆68Apr 24, 2024Updated last year