AkiRusProd / numpy-transformerLinks

A numpy implementation of the Transformer model in "Attention is All You Need"

☆58

Alternatives and similar repositories for numpy-transformer

Users that are interested in numpy-transformer are comparing it to the libraries listed below

Sorting:

hkproj / pytorch-llama
LLaMA 2 implemented from scratch in PyTorch
☆353Updated 2 years ago
hkproj / quantization-notes
Notes on quantization in neural networks
☆104Updated last year
hkproj / pytorch-lora
LORA: Low-Rank Adaptation of Large Language Models implemented using PyTorch
☆116Updated 2 years ago
jsbaan / transformer-from-scratch
Well documented, unit tested, type checked and formatted implementation of a vanilla transformer - for educational purposes.
☆262Updated last year
hkproj / triton-flash-attention
☆206Updated 9 months ago
gpu-mode / profiling-cuda-in-torch
☆173Updated last year
lucasdelimanogueira / PyNorch
Recreating PyTorch from scratch (C/C++, CUDA, NCCL and Python, with multi-GPU support and automatic differentiation!)
☆159Updated last year
lessw2020 / triton_kernels_for_fun_and_profit
Custom kernels in Triton language for accelerating LLMs
☆26Updated last year
LambdaLabsML / distributed-training-guide
Best practices & guides on how to write distributed pytorch training code
☆487Updated 7 months ago
wolfecameron / nanoMoE
An extension of the nanoGPT repository for training small MOE models.
☆196Updated 7 months ago
huggingface / picotron_tutorial
☆222Updated last week
MekkCyber / TritonAcademy
A repository to unravel the language of GPUs, making their kernel conversations easy to understand
☆194Updated 4 months ago
tspeterkim / paged-attention-minimal
a minimal cache manager for PagedAttention, on top of llama3.
☆123Updated last year
aykutcayir34 / DifferentialTransformer
☆13Updated 11 months ago
srush / annotated-mamba
Annotated version of the Mamba paper
☆489Updated last year
hkproj / pytorch-transformer-distributed
Distributed training (multi-node) of a Transformer model
☆84Updated last year
gpu-mode / triton-index
Cataloging released Triton kernels.
☆261Updated last month
ash-01xor / bpe.c
Simple Byte pair Encoding mechanism used for tokenization process . written purely in C
☆136Updated 10 months ago
BobMcDear / attorch
A subset of PyTorch's neural network modules, written in Python using OpenAI's Triton.
☆577Updated last month
kyegomez / FlashAttention20
Get down and dirty with FlashAttention2.0 in pytorch, plug in and play no complex CUDA kernels
☆108Updated 2 years ago
rasbt / dora-from-scratch
LoRA and DoRA from Scratch Implementations
☆211Updated last year
PeaBrane / mamba-tiny
Simple, minimal implementation of the Mamba SSM in one pytorch file. Using logcumsumexp (Heisen sequence).
☆122Updated 11 months ago
facebookresearch / optimizers
For optimization algorithm research and development.
☆539Updated 2 weeks ago
tintn / vision-transformer-from-scratch
A Simplified PyTorch Implementation of Vision Transformer (ViT)
☆211Updated last year
hkproj / multi-latent-attention
☆45Updated 4 months ago
gau-nernst / learn-cuda
Learn CUDA with PyTorch
☆85Updated 2 weeks ago
rasbt / pytorch-memory-optim
This code repository contains the code used for my "Optimizing Memory Usage for Training LLMs and Vision Transformers in PyTorch" blog po…
☆92Updated 2 years ago
linjames0 / Transformer-CUDA
An implementation of the transformer architecture onto an Nvidia CUDA kernel
☆190Updated 2 years ago
coaxsoft / pytorch_bert
Tutorial for how to build BERT from scratch
☆99Updated last year
johnma2006 / candle
Deep learning library implemented from scratch in numpy. Mixtral, Mamba, LLaMA, GPT, ResNet, and other experiments.
☆52Updated last year