skolai / fewbitLinks

Compression schema for gradients of activations in backward pass

☆44

Alternatives and similar repositories for fewbit

Users that are interested in fewbit are comparing it to the libraries listed below

Sorting:

zhuchen03 / gradinit
Learning to Initialize Neural Networks for Stable and Efficient Training
☆139Updated 3 years ago
graphcore-research / out-of-the-box-fp8-training
Demo of the unit_scaling library, showing how a model can be easily adapted to train in FP8.
☆46Updated last year
TezRomacH / layer-to-layer-pytorch
PyTorch implementation of L2L execution algorithm
☆107Updated 2 years ago
graphcore-research / unit-scaling
A library for unit scaling in PyTorch
☆125Updated 7 months ago
lernapparat / torchhacks
Hacks for PyTorch
☆19Updated 2 years ago
mgmalek / efficient_cross_entropy
☆112Updated last year
jfainberg / hashed_nets
PyTorch implementation of HashedNets
☆36Updated 2 years ago
srush / triton-autodiff
Experiment of using Tangent to autodiff triton
☆79Updated last year
pytorch-labs / superblock
A block oriented training approach for inference time optimization.
☆33Updated 10 months ago
KhrulkovV / tt-pytorch
☆60Updated 5 years ago
corl-team / lime
Official implementation of the paper "You Do Not Fully Utilize Transformer's Representation Capacity"
☆29Updated last month
yandex-research / DeDLOC
Official code for "Distributed Deep Learning in Open Collaborations" (NeurIPS 2021)
☆117Updated 3 years ago
ahennequ / pytorch-custom-mma
☆29Updated 2 years ago
JeanKaddour / LAWA
Latest Weight Averaging (NeurIPS HITY 2022)
☆30Updated 2 years ago
gregorbachmann / scaling_mlps
☆51Updated last year
FusionBrainLab / LLM-Microscope
☆70Updated 10 months ago
jiaweizzhao / ZerO-initialization
☆74Updated 2 years ago
kshitij12345 / torchnnprofiler
Context Manager to profile the forward and backward times of PyTorch's nn.Module
☆83Updated last year
optimizedlearning / mechanic
☆36Updated last year
RobertCsordas / moe_layer
sigma-MoE layer
☆20Updated last year
chenjoya / dropit
DropIT: Dropping Intermediate Tensors for Memory-Efficient DNN Training (ICLR 2023)
☆31Updated 2 years ago
RuslanKhalitov / ChordMixer
The official implementation of the ChordMixer architecture.
☆61Updated 2 years ago
topal-team / rockmate
☆36Updated 7 months ago
shikaiqiu / compute-better-spent
☆53Updated 9 months ago
DeMoriarty / custom_matmul_kernels
Customized matrix multiplication kernels
☆56Updated 3 years ago
lucidrains / autoregressive-linear-attention-cuda
CUDA implementation of autoregressive linear attention, with all the latest research findings
☆44Updated 2 years ago
jundaf2 / INT8-Flash-Attention-FMHA-Quantization
☆157Updated last year
habanero-lab / APPy
APPy (Annotated Parallelism for Python) enables users to annotate loops and tensor expressions in Python with compiler directives akin to…
☆23Updated 3 weeks ago
AminRezaei0x443 / memory-efficient-attention
Memory Efficient Attention (O(sqrt(n)) for Jax and PyTorch
☆184Updated 2 years ago
chrundle / biprop
Identify a binary weight or binary weight and activation subnetwork within a randomly initialized network by only pruning and binarizing …
☆52Updated 3 years ago