irhum / hyenaLinks

JAX/Flax implementation of the Hyena Hierarchy

☆34

Alternatives and similar repositories for hyena

Users that are interested in hyena are comparing it to the libraries listed below

Sorting:

expz / annotated-hyena
An annotated implementation of the Hyena Hierarchy paper
☆34Updated 2 years ago
EleutherAI / rnngineering
Engineering the state of RNN language models (Mamba, RWKV, etc.)
☆32Updated last year
samblouir / birdie
☆13Updated 5 months ago
vvvm23 / mamba-jax
Unofficial but Efficient Implementation of "Mamba: Linear-Time Sequence Modeling with Selective State Spaces" in JAX
☆88Updated last year
srush / tangent
Source-to-Source Debuggable Derivatives in Pure Python
☆15Updated last year
lucidrains / gateloop-transformer
Implementation of GateLoop Transformer in Pytorch and Jax
☆90Updated last year
amirzandieh / HyperAttention
Triton Implementation of HyperAttention Algorithm
☆48Updated last year
EleutherAI / mdl
Minimum Description Length probing for neural network representations
☆20Updated 9 months ago
proger / nanokitchen
Parallel Associative Scan for Language Models
☆17Updated last year
microsoft / mutransformers
some common Huggingface transformers in maximal update parametrization (µP)
☆86Updated 3 years ago
srush / mamba-scans
Blog post
☆17Updated last year
BlinkDL / SmallInitEmb
LayerNorm(SmallInit(Embedding)) in a Transformer to improve convergence
☆58Updated 3 years ago
HomebrewML / Olmax
HomebrewNLP in JAX flavour for maintable TPU-Training
☆51Updated last year
LiibanMo / scikit-jax
Your favourite classical machine learning algos on the GPU/TPU
☆20Updated 9 months ago
HazyResearch / embroid
Embroid: Unsupervised Prediction Smoothing Can Improve Few-Shot Classification
☆11Updated 2 years ago
shikaiqiu / compute-better-spent
☆58Updated last year
lucidrains / pause-transformer
Yet another random morning idea to be quickly tried and architecture shared if it works; to allow the transformer to pause for any amount…
☆52Updated 2 years ago
jxiw / BiGS
Official Repository of Pretraining Without Attention (BiGS), BiGS is the first model to achieve BERT-level transfer learning on the GLUE …
☆114Updated last year
sustcsonglin / gated_linear_attention_layer
☆31Updated last year
google-research / precondition
☆31Updated 4 months ago
lucidrains / memory-editable-transformer
My explorations into editing the knowledge and memories of an attention network
☆34Updated 2 years ago
lucidrains / autoregressive-linear-attention-cuda
CUDA implementation of autoregressive linear attention, with all the latest research findings
☆45Updated 2 years ago
lucidrains / flash-genomics-model
My own attempt at a long context genomics model, leveraging recent advances in long context attention modeling (Flash Attention + other h…
☆54Updated 2 years ago
dylandoblar / noether-networks
Meta-learning inductive biases in the form of useful conserved quantities.
☆37Updated 2 years ago
codekansas / rwkv
RWKV model implementation
☆38Updated 2 years ago
sustcsonglin / mamba-triton
☆48Updated last year
AndyShih12 / LongHorizonTemperatureScaling
PyTorch implementation for "Long Horizon Temperature Scaling", ICML 2023
☆20Updated 2 years ago
Doraemonzzz / hgru-pytorch
☆28Updated last year
lucidrains / token-shift-gpt
Implementation of Token Shift GPT - An autoregressive model that solely relies on shifting the sequence space for mixing
☆50Updated 3 years ago
radarFudan / mamba-minimal-jax
☆34Updated 11 months ago