drisspg / transformer_nuggetsLinks

A place to store reusable transformer components of my own creation or found on the interwebs

☆60

Alternatives and similar repositories for transformer_nuggets

Users that are interested in transformer_nuggets are comparing it to the libraries listed below

Sorting:

srush / triton-autodiff
Experiment of using Tangent to autodiff triton
☆80Updated last year
amirzandieh / HyperAttention
Triton Implementation of HyperAttention Algorithm
☆48Updated last year
UmerHA / triton_util
Make triton easier
☆48Updated last year
cloneofsimo / min-fsdp
☆91Updated last year
lianakoleva / no-libtorch-compile
☆21Updated 7 months ago
graphcore-research / out-of-the-box-fp8-training
Demo of the unit_scaling library, showing how a model can be easily adapted to train in FP8.
☆45Updated last year
lessw2020 / transformer_central
Various transformers for FSDP research
☆38Updated 2 years ago
softmax1 / Flash-Attention-Softmax-N
CUDA and Triton implementations of Flash Attention with SoftmaxN.
☆73Updated last year
apoorvkh / torchrunx
Easily run PyTorch on multiple GPUs & machines
☆47Updated 2 weeks ago
mgmalek / efficient_cross_entropy
☆121Updated last year
cat-state / tinypar
☆20Updated 2 years ago
eth-easl / fmengine
Utilities for Training Very Large Models
☆58Updated last year
joey00072 / microjax
Jax like function transformation engine but micro, microjax
☆33Updated last year
kshitij12345 / torchnnprofiler
Context Manager to profile the forward and backward times of PyTorch's nn.Module
☆82Updated 2 years ago
srush / Tensor-Puzzles-Penzai
☆21Updated last year
crowsonkb / LDLM
Latent Diffusion Language Models
☆68Updated 2 years ago
rasbt / pytorch-memory-optim
This code repository contains the code used for my "Optimizing Memory Usage for Training LLMs and Vision Transformers in PyTorch" blog po…
☆92Updated 2 years ago
frankxwang / dpo-prefix-sharing
DPO, but faster 🚀
☆45Updated 10 months ago
thecharlieblake / lovely-llama
An implementation of the Llama architecture, to instruct and delight
☆21Updated 4 months ago
modal-labs / ci-on-modal
A sample pattern for running CI tests on Modal
☆18Updated 6 months ago
lernapparat / torchhacks
Hacks for PyTorch
☆19Updated 2 years ago
microsoft / mutransformers
some common Huggingface transformers in maximal update parametrization (µP)
☆86Updated 3 years ago
graphcore-research / unit-scaling
A library for unit scaling in PyTorch
☆132Updated 3 months ago
google-deepmind / asyncdiloco
☆46Updated last year
joey00072 / ohara
Collection of autoregressive model implementation
☆86Updated 6 months ago
srush / LLM-Talk
☆52Updated last year
shreyansh26 / Attention-Mask-Patterns
Using FlexAttention to compute attention with different masking patterns
☆47Updated last year
stas00 / ml-ways
ML/DL Math and Method notes
☆64Updated last year
rwightman / genalog
Genalog is an open source, cross-platform python package allowing generation of synthetic document images with custom degradations and te…
☆43Updated last year
cloneofsimo / min-max-gpt
Minimal (400 LOC) implementation Maximum (multi-node, FSDP) GPT training
☆132Updated last year