RuslanKhalitov / ChordMixer
The official implementation of the ChordMixer architecture.
☆61Updated last year
Alternatives and similar repositories for ChordMixer:
Users that are interested in ChordMixer are comparing it to the libraries listed below
- Official code for Long Expressive Memory (ICLR 2022, Spotlight)☆69Updated 2 years ago
- Deep Learning Audio Course – AI Masters☆28Updated 9 months ago
- Compression schema for gradients of activations in backward pass☆44Updated last year
- AdaCat☆49Updated 2 years ago
- ☆71Updated 5 months ago
- Lightweight knowledge distillation pipeline☆28Updated 3 years ago
- Layerwise Batch Entropy Regularization☆22Updated 2 years ago
- ☆20Updated 7 months ago
- ☆17Updated 2 months ago
- FID computation in Jax/Flax.☆26Updated 7 months ago
- FusionBrain Challenge 2.0: creating multimodal multitask model☆16Updated 2 years ago
- ☆36Updated last year
- Attempt to make multiple residual streams from Bytedance's Hyper-Connections paper accessible to the public☆72Updated this week
- Very simple and short implementation of gradient boosting in 18 lines of code☆9Updated 4 years ago
- Framework for processing and filtering datasets☆27Updated 6 months ago
- ☆30Updated 2 months ago
- ☆21Updated last year
- ☆31Updated 2 years ago
- Another attempt at a long-context / efficient transformer by me☆37Updated 2 years ago
- Exploration into the proposed "Self Reasoning Tokens" by Felipe Bonetto☆55Updated 9 months ago
- [ICML 2024] SIRFShampoo: Structured inverse- and root-free Shampoo in PyTorch (https://arxiv.org/abs/2402.03496)☆14Updated 3 months ago
- Code for the paper "PALBERT: Teaching ALBERT to Ponder", NeurIPS 2022 Spotlight☆37Updated last year
- Implementation of RQ Transformer, proposed in the paper "Autoregressive Image Generation using Residual Quantization"☆99Updated 2 years ago
- T5-based (russian) text normalization☆20Updated last year
- Implementation of a Transformer that Ponders, using the scheme from the PonderNet paper☆80Updated 3 years ago
- GULAG: GUessing LAnGuages with neural networks☆13Updated 2 years ago
- An implementation of the Llama architecture, to instruct and delight☆21Updated last month
- Implementation of the proposed Adam-atan2 from Google Deepmind in Pytorch☆102Updated 2 months ago
- Implementation of the Kalman Filtering Attention proposed in "Kalman Filtering Attention for User Behavior Modeling in CTR Prediction"☆57Updated last year