joey00072 / microjaxLinks

Jax like function transformation engine but micro, microjax

☆33

Alternatives and similar repositories for microjax

Users that are interested in microjax are comparing it to the libraries listed below

Sorting:

okarthikb / state-space-models
☆28Updated last year
dvruette / barrel-rec-pytorch
☆53Updated last year
lucaslingle / mu_transformer
Transformer with Mu-Parameterization, implemented in Jax/Flax. Supports FSDP on TPU pods.
☆32Updated 6 months ago
vvvm23 / mamba-jax
Unofficial but Efficient Implementation of "Mamba: Linear-Time Sequence Modeling with Selective State Spaces" in JAX
☆92Updated last year
lucidrains / grokfast-pytorch
Explorations into the proposal from the paper "Grokfast, Accelerated Grokking by Amplifying Slow Gradients"
☆103Updated 11 months ago
drisspg / transformer_nuggets
A place to store reusable transformer components of my own creation or found on the interwebs
☆62Updated 2 weeks ago
amirzandieh / HyperAttention
Triton Implementation of HyperAttention Algorithm
☆48Updated last year
tyler-romero / microR1
Simple repository for training small reasoning models
☆46Updated 9 months ago
yixiaoer / mistral-v0.2-jax
JAX implementation of the Mistral 7b v0.2 model
☆35Updated last year
evanatyourservice / llm-jax
Train a SmolLM-style llm on fineweb-edu in JAX/Flax with an assortment of optimizers.
☆18Updated 4 months ago
VatsaDev / NanoPoor
NanoGPT-speedrunning for the poor T4 enjoyers
☆73Updated 7 months ago
edwardmilsom / function-space-learning-rates-paper
Code for the paper "Function-Space Learning Rates"
☆23Updated 6 months ago
LucasPrietoAl / grokking-at-the-edge-of-numerical-stability
☆105Updated 4 months ago
proger / nanokitchen
Parallel Associative Scan for Language Models
☆18Updated last year
YuchenJin / llm.c
LLM training in simple, raw C/CUDA
☆15Updated last year
google-deepmind / spectral_ssm
☆34Updated last year
joey00072 / ohara
Collection of autoregressive model implementation
☆86Updated 7 months ago
google-deepmind / dks
Multi-framework implementation of Deep Kernel Shaping and Tailored Activation Transformations, which are methods that modify neural netwo…
☆74Updated 5 months ago
srush / Tensor-Puzzles-Penzai
☆21Updated last year
catid / spectral_ssm
Implementation of Spectral State Space Models
☆16Updated last year
srush / triton-autodiff
Experiment of using Tangent to autodiff triton
☆80Updated last year
epfml / DenseFormer
☆82Updated last year
main-horse / hnet-old
H-Net Dynamic Hierarchical Architecture
☆80Updated 2 months ago
HazyResearch / train-tk
train with kittens!
☆63Updated last year
minosvasilias / simple_grpo
Simple GRPO scripts and configurations.
☆59Updated 10 months ago
shikaiqiu / compute-better-spent
☆61Updated last year
lucidrains / GAF-microbatch-pytorch
Implementation of Gradient Agreement Filtering, from Chaubard et al. of Stanford, but for single machine microbatches, in Pytorch
☆25Updated 10 months ago
crowsonkb / dice-mc
DiCE: The Infinitely Differentiable Monte-Carlo Estimator
☆32Updated 2 years ago
sholtodouglas / scalingExperiments
☆62Updated 3 years ago
EleutherAI / rnngineering
Engineering the state of RNN language models (Mamba, RWKV, etc.)
☆32Updated last year