RuslanKhalitov / ChordMixerLinks

The official implementation of the ChordMixer architecture.

☆61

Alternatives and similar repositories for ChordMixer

Users that are interested in ChordMixer are comparing it to the libraries listed below

Sorting:

tk-rusch / LEM
Official code for Long Expressive Memory (ICLR 2022, Spotlight)
☆71Updated 3 years ago
gcambara / cape
Continuous Augmented Positional Embeddings (CAPE) implementation for PyTorch
☆42Updated 2 years ago
lucidrains / perceiver-ar-pytorch
Implementation of Perceiver AR, Deepmind's new long-context attention network based on Perceiver architecture, in Pytorch
☆91Updated 2 years ago
ctlllll / SGConv
☆164Updated 2 years ago
peerdavid / layerwise-batch-entropy
Layerwise Batch Entropy Regularization
☆23Updated 3 years ago
skolai / fewbit
Compression schema for gradients of activations in backward pass
☆44Updated 2 years ago
lucidrains / ponder-transformer
Implementation of a Transformer that Ponders, using the scheme from the PonderNet paper
☆81Updated 3 years ago
thjashin / multires-conv
Sequence Modeling with Multiresolution Convolutional Memory (ICML 2023)
☆127Updated 2 years ago
liu-ziyin / NeurIPS_2020_Snake
☆31Updated 4 years ago
esceptico / perceiver-io
Unofficial implementation of Perceiver IO
☆128Updated 3 years ago
zhuchen03 / gradinit
Learning to Initialize Neural Networks for Stable and Efficient Training
☆139Updated 3 years ago
ColinQiyangLi / AdaCat
AdaCat
☆49Updated 3 years ago
TomFrederik / grokking
Re-implementation of 'Grokking: Generalization beyond overfitting on small algorithmic datasets'
☆38Updated 3 years ago
jiaweizzhao / ZerO-initialization
☆75Updated 2 years ago
esceptico / squeezer
Lightweight knowledge distillation pipeline
☆28Updated 3 years ago
lucidrains / gated-state-spaces-pytorch
Implementation of Gated State Spaces, from the paper "Long Range Language Modeling via Gated State Spaces", in Pytorch
☆101Updated 2 years ago
aliutkus / spe
Relative Positional Encoding for Transformers with Linear Complexity
☆65Updated 3 years ago
pseeth / autoclip
Adaptive Gradient Clipping
☆149Updated 3 years ago
lucidrains / Mega-pytorch
Implementation of Mega, the Single-head Attention with Multi-headed EMA architecture that currently holds SOTA on Long Range Arena
☆206Updated 2 years ago
lucidrains / kalman-filtering-attention
Implementation of the Kalman Filtering Attention proposed in "Kalman Filtering Attention for User Behavior Modeling in CTR Prediction"
☆59Updated last year
shyamsn97 / hyper-nn
Easy Hypernetworks in Pytorch and Jax
☆105Updated 2 years ago
lucidrains / gateloop-transformer
Implementation of GateLoop Transformer in Pytorch and Jax
☆90Updated last year
gisilvs / AEF
☆33Updated 2 years ago
lucidrains / axial-positional-embedding
Axial Positional Embedding for Pytorch
☆83Updated 7 months ago
lucidrains / hourglass-transformer-pytorch
Implementation of Hourglass Transformer, in Pytorch, from Google and OpenAI
☆94Updated 3 years ago
yoyolicoris / variational-diffwave
☆32Updated 3 years ago
Newbeeer / Anytime-Auto-Regressive-Model
Code for ICLR 2021 Paper, "Anytime Sampling for Autoregressive Models via Ordered Autoencoding"
☆26Updated 2 years ago
BlinkDL / SmallInitEmb
LayerNorm(SmallInit(Embedding)) in a Transformer to improve convergence
☆58Updated 3 years ago
lucidrains / adam-atan2-pytorch
Implementation of the proposed Adam-atan2 from Google Deepmind in Pytorch
☆129Updated this week
crowsonkb / LDLM
Latent Diffusion Language Models
☆68Updated 2 years ago