RuslanKhalitov / ChordMixerLinks
The official implementation of the ChordMixer architecture.
☆61Updated 2 years ago
Alternatives and similar repositories for ChordMixer
Users that are interested in ChordMixer are comparing it to the libraries listed below
Sorting:
- Official code for Long Expressive Memory (ICLR 2022, Spotlight)☆69Updated 3 years ago
- Compression schema for gradients of activations in backward pass☆44Updated last year
- [ICML 2024] SIRFShampoo: Structured inverse- and root-free Shampoo in PyTorch (https://arxiv.org/abs/2402.03496)☆14Updated 7 months ago
- This is the repo for DenseAttention and DANet - fast and conceptually simple modification of standard attention and Transformer☆11Updated last week
- FusionBrain Challenge 2.0: creating multimodal multitask model☆16Updated 2 years ago
- Layerwise Batch Entropy Regularization☆23Updated 2 years ago
- Official implementation of the paper "You Do Not Fully Utilize Transformer's Representation Capacity"☆28Updated 3 weeks ago
- Deep Learning Audio Course – AI Masters☆32Updated last month
- ☆70Updated 10 months ago
- ☆22Updated last year
- Codes accompanying the paper "LaProp: a Better Way to Combine Momentum with Adaptive Gradient"☆29Updated 4 years ago
- ☆36Updated last year
- ☆18Updated 2 months ago
- Implementation of Gradient Agreement Filtering, from Chaubard et al. of Stanford, but for single machine microbatches, in Pytorch☆25Updated 5 months ago
- Deep Generative Models course, 2021☆22Updated 3 years ago
- Why Do We Need Weight Decay in Modern Deep Learning? [NeurIPS 2024]☆66Updated 9 months ago
- A repo based on XiLin Li's PSGD repo that extends some of the experiments.☆14Updated 8 months ago
- ☆31Updated 7 months ago
- This is the official repo for Gradient Agreement Filtering (GAF).☆24Updated 5 months ago
- ☆21Updated 3 weeks ago
- Implementation of GateLoop Transformer in Pytorch and Jax☆89Updated last year
- Very simple and short implementation of gradient boosting in 18 lines of code☆9Updated 4 years ago
- FID computation in Jax/Flax.☆27Updated 11 months ago
- Lightweight knowledge distillation pipeline☆28Updated 3 years ago
- RuTransform: python framework for adversarial attacks and text data augmentation for Russian☆19Updated 2 years ago
- Deep Networks Grok All the Time and Here is Why☆37Updated last year
- ☆20Updated 11 months ago
- GULAG: GUessing LAnGuages with neural networks☆13Updated 3 years ago
- Exploration into the proposed "Self Reasoning Tokens" by Felipe Bonetto☆56Updated last year
- Single-line inference of SOTA deep learning models☆29Updated 2 years ago