graphcore-research / pytorch-tensor-trackerLinks

Flexibly track outputs and grad-outputs of torch.nn.Module.

☆13

Alternatives and similar repositories for pytorch-tensor-tracker

Users that are interested in pytorch-tensor-tracker are comparing it to the libraries listed below

Sorting:

cloneofsimo / ezmup
Simple implementation of muP, based on Spectral Condition for Feature Learning. The implementation is SGD only, dont use it for Adam
☆83Updated 11 months ago
cloneofsimo / insightful-nn-papers
These papers will provide unique insightful concepts that will broaden your perspective on neural networks and deep learning
☆48Updated last year
cloneofsimo / min-fsdp
☆81Updated last year
young-geng / scalax
A simple library for scaling up JAX programs
☆139Updated 8 months ago
davisyoshida / lorax
LoRA for arbitrary JAX models and functions
☆140Updated last year
vvvm23 / mamba-jax
Unofficial but Efficient Implementation of "Mamba: Linear-Time Sequence Modeling with Selective State Spaces" in JAX
☆84Updated last year
srush / triton-autodiff
Experiment of using Tangent to autodiff triton
☆79Updated last year
yixiaoer / mistral-v0.2-jax
JAX implementation of the Mistral 7b v0.2 model
☆35Updated last year
davisyoshida / qax
If it quacks like a tensor...
☆58Updated 8 months ago
cloneofsimo / scaling-guide
WIP
☆93Updated 11 months ago
EleutherAI / nanoGPT-mup
The simplest, fastest repository for training/finetuning medium-sized GPTs.
☆149Updated 3 weeks ago
ethansmith2000 / fsdp_optimizers
supporting pytorch FSDP for optimizers
☆85Updated 7 months ago
cloneofsimo / min-max-gpt
Minimal (400 LOC) implementation Maximum (multi-node, FSDP) GPT training
☆129Updated last year
nshepperd / flash_attn_jax
JAX bindings for Flash Attention v2
☆90Updated last year
edwardmilsom / function-space-learning-rates-paper
Code for the paper "Function-Space Learning Rates"
☆22Updated last month
thecharlieblake / lovely-llama
An implementation of the Llama architecture, to instruct and delight
☆21Updated last month
yixiaoer / einshard
Einsum-like high-level array sharding API for JAX
☆35Updated last year
kvfrans / splus
☆114Updated last month
shikaiqiu / compute-better-spent
☆53Updated 9 months ago
yixiaoer / tpu-training-example
☆14Updated last year
xrsrke / pipegoose
Large scale 4D parallelism pre-training for 🤗 transformers in Mixture of Experts *(still work in progress)*
☆85Updated last year
srush / mamba-primer
☆37Updated last year
proger / accelerated-scan
Accelerated First Order Parallel Associative Scan
☆183Updated 11 months ago
yixiaoer / tpux
A set of Python scripts that makes your experience on TPU better
☆55Updated last year
graphcore-research / unit-scaling
A library for unit scaling in PyTorch
☆128Updated 2 weeks ago
martin-marek / batch-size
📄Small Batch Size Training for Language Models
☆31Updated last week
young-geng / mintext
Minimal but scalable implementation of large language models in JAX
☆35Updated last week
modula-systems / modula
🧱 Modula software package
☆209Updated 3 months ago
sustcsonglin / mamba-triton
☆48Updated last year
microsoft / mutransformers
some common Huggingface transformers in maximal update parametrization (µP)
☆82Updated 3 years ago