facebookresearch / diffqLinks

DiffQ performs differentiable quantization using pseudo quantization noise. It can automatically tune the number of bits used per weight or group of weights, in order to achieve a given trade-off between model size and accuracy.

☆236

Alternatives and similar repositories for diffq

Users that are interested in diffq are comparing it to the libraries listed below

Sorting:

AminRezaei0x443 / memory-efficient-attention
Memory Efficient Attention (O(sqrt(n)) for Jax and PyTorch
☆184Updated 2 years ago
lucidrains / flash-cosine-sim-attention
Implementation of fused cosine similarity attention in the same style as Flash Attention
☆218Updated 2 years ago
HomebrewML / revlib
Simple and efficient RevNet-Library for PyTorch with XLA and DeepSpeed support and parameter offload
☆131Updated 3 years ago
yandex-research / DeDLOC
Official code for "Distributed Deep Learning in Open Collaborations" (NeurIPS 2021)
☆118Updated 3 years ago
lmnt-com / graftr
graftr: an interactive shell to view and edit PyTorch checkpoints.
☆114Updated 5 years ago
Cerebras / online-normalization
Online Normalization for Training Neural Networks (Companion Repository)
☆86Updated 4 years ago
lucidrains / triton-transformer
Implementation of a Transformer, but completely in Triton
☆277Updated 3 years ago
pseeth / autoclip
Adaptive Gradient Clipping
☆151Updated 3 years ago
jongwook / tfrecord_lite
Make TFRecord Usable Again
☆90Updated 2 years ago
TezRomacH / layer-to-layer-pytorch
PyTorch implementation of L2L execution algorithm
☆109Updated 2 years ago
lucidrains / PaLM-jax
Implementation of the specific Transformer architecture from PaLM - Scaling Language Modeling with Pathways - in Jax (Equinox framework)
☆189Updated 3 years ago
cfoster0 / CLAP
Contrastive Language-Audio Pretraining
☆88Updated 3 years ago
google-research / diffstride
TF/Keras code for DiffStride, a pooling layer with learnable strides.
☆124Updated 3 years ago
lucidrains / Mega-pytorch
Implementation of Mega, the Single-head Attention with Multi-headed EMA architecture that currently holds SOTA on Long Range Arena
☆207Updated 2 years ago
facebookresearch / mega
Sequence modeling with Mega.
☆301Updated 2 years ago
ctlllll / SGConv
☆164Updated 2 years ago
HomebrewML / HomebrewNLP-torch
A case study of efficient training of large language models using commodity hardware.
☆68Updated 3 years ago
facebookresearch / flashy
Framework for writing deep learning training loops. Lightweight, and retaining full freedom to design as you see fits. It handles checkpo…
☆116Updated last year
romesco / hydra-lightning
Configuration classes enabling Hydra to configure and manage Pytorch Lightning projects.
☆43Updated 4 years ago
lucidrains / fast-transformer-pytorch
Implementation of Fast Transformer in Pytorch
☆177Updated 4 years ago
mit-han-lab / neurips-micronet
[JMLR'20] NeurIPS 2019 MicroNet Challenge Efficient Language Modeling, Champion
☆41Updated 4 years ago
lucidrains / Adan-pytorch
Implementation of the Adan (ADAptive Nesterov momentum algorithm) Optimizer in Pytorch
☆252Updated 3 years ago
DeMoriarty / custom_matmul_kernels
Customized matrix multiplication kernels
☆57Updated 3 years ago
pytorch / nestedtensor
[Prototype] Tools for the concurrent manipulation of variably sized Tensors.
☆251Updated 3 years ago
lucidrains / hourglass-transformer-pytorch
Implementation of Hourglass Transformer, in Pytorch, from Google and OpenAI
☆97Updated 3 years ago
tmbdev-archive / webdataset-lightning
A small demonstration of using WebDataset with ImageNet and PyTorch Lightning
☆75Updated last year
archinetai / surgeon-pytorch
A library to inspect and extract intermediate layers of PyTorch models.
☆474Updated 3 years ago
lucidrains / flash-attention-jax
Implementation of Flash Attention in Jax
☆221Updated last year
facebookresearch / gtn_applications
Applications using the GTN library and code to reproduce experiments in "Differentiable Weighted Finite-State Transducers"
☆83Updated 3 years ago
google-research / jestimator
Amos optimizer with JEstimator lib.
☆82Updated last year