bhavnicksm / vanilla-transformer-jaxLinks

JAX/Flax implimentation of 'Attention Is All You Need' by Vaswani et al. (https://arxiv.org/abs/1706.03762)

☆15

Alternatives and similar repositories for vanilla-transformer-jax

Users that are interested in vanilla-transformer-jax are comparing it to the libraries listed below

Sorting:

xrsrke / pipegoose
Large scale 4D parallelism pre-training for 🤗 transformers in Mixture of Experts *(still work in progress)*
☆84Updated last year
dylandoblar / noether-networks
Meta-learning inductive biases in the form of useful conserved quantities.
☆37Updated 2 years ago
lucidrains / PaLM-jax
Implementation of the specific Transformer architecture from PaLM - Scaling Language Modeling with Pathways - in Jax (Equinox framework)
☆187Updated 3 years ago
HomebrewML / Olmax
HomebrewNLP in JAX flavour for maintable TPU-Training
☆50Updated last year
zphang / minimal-opt
☆67Updated 2 years ago
davisyoshida / lorax
LoRA for arbitrary JAX models and functions
☆139Updated last year
wattenberg / superposition
Code associated to papers on superposition (in ML interpretability)
☆28Updated 2 years ago
davisyoshida / haiku-mup
A port of muP to JAX/Haiku
☆25Updated 2 years ago
ClashLuke / tpucare
Automatically take good care of your preemptible TPUs
☆36Updated 2 years ago
lucidrains / token-shift-gpt
Implementation of Token Shift GPT - An autoregressive model that solely relies on shifting the sequence space for mixing
☆50Updated 3 years ago
leonard-gleyzer / connex
Fine-grained, dynamic control of neural network topology in JAX.
☆21Updated last year
conceptofmind / vit-flax
Implementation of numerous Vision Transformers in Google's JAX and Flax.
☆22Updated 2 years ago
HomebrewML / HomebrewNLP-torch
A case study of efficient training of large language models using commodity hardware.
☆68Updated 2 years ago
cloneofsimo / ezmup
Simple implementation of muP, based on Spectral Condition for Feature Learning. The implementation is SGD only, dont use it for Adam
☆81Updated 11 months ago
dvruette / barrel-rec-pytorch
☆53Updated last year
graphcore-research / pytorch-tensor-tracker
Flexibly track outputs and grad-outputs of torch.nn.Module.
☆13Updated last year
nreimers / flax-sentence-embeddings
Shared code for training sentence embeddings with Flax / JAX
☆27Updated 3 years ago
huggingface / bloom-jax-inference
☆67Updated 2 years ago
vvvm23 / mamba-jax
Unofficial but Efficient Implementation of "Mamba: Linear-Time Sequence Modeling with Selective State Spaces" in JAX
☆83Updated last year
abhishekpanigrahi1996 / transformer_in_transformer
☆45Updated last year
radarFudan / mamba-minimal-jax
☆31Updated 7 months ago
microsoft / mutransformers
some common Huggingface transformers in maximal update parametrization (µP)
☆81Updated 3 years ago
epfml / DenseFormer
☆81Updated last year
google-deepmind / jmp
JMP is a Mixed Precision library for JAX.
☆204Updated 5 months ago
sooheon / perceiver-jax
Perceiver (transformer variant) implemented in JAX and Flax
☆12Updated 4 years ago
srush / mamba-scans
Blog post
☆17Updated last year
cat-state / tinypar
☆20Updated last year
ethansmith2000 / fsdp_optimizers
supporting pytorch FSDP for optimizers
☆82Updated 6 months ago
SeanNaren / minGPT
A minimal PyTorch Lightning OpenAI GPT w DeepSpeed Training!
☆111Updated 2 years ago
srush / anynp
Proof-of-concept of global switching between numpy/jax/pytorch in a library.
☆18Updated last year