guillaumeBellec / multitask
☆22Updated last month
Related projects ⓘ
Alternatives and complementary repositories for multitask
- ☆164Updated last year
- A home for audio ML in JAX. Has common features, learnable frontends, pretrained supervised and self-supervised models.☆62Updated 2 years ago
- A practical implementation of GradNorm, Gradient Normalization for Adaptive Loss Balancing, in Pytorch☆77Updated 10 months ago
- Implementation of fused cosine similarity attention in the same style as Flash Attention☆207Updated last year
- My attempts at applying Soundstream design on learned tokenization of text and then applying hierarchical attention to text generation☆82Updated last month
- Code repository for the ICLR 2022 paper "FlexConv: Continuous Kernel Convolutions With Differentiable Kernel Sizes" https://openreview.ne…☆115Updated last year
- TF/Keras code for DiffStride, a pooling layer with learnable strides.☆124Updated 2 years ago
- Sequence Modeling with Structured State Spaces☆60Updated 2 years ago
- Jax/Flax implementation of Variational-DiffWave.☆40Updated 2 years ago
- [ICML 2024] SIRFShampoo: Structured inverse- and root-free Shampoo in PyTorch (https://arxiv.org/abs/2402.03496)☆13Updated 2 weeks ago
- An implementation of PSGD Kron second-order optimizer for PyTorch☆16Updated this week
- ☆129Updated last week
- Explorations into the recently proposed Taylor Series Linear Attention☆90Updated 3 months ago
- Implementation of the proposed Adam-atan2 from Google Deepmind in Pytorch☆94Updated this week
- Sequence Modeling with Multiresolution Convolutional Memory (ICML 2023)☆120Updated last year
- Framework for writing deep learning training loops. Lightweight, and retaining full freedom to design as you see fits. It handles checkpo…☆103Updated 8 months ago
- ☆62Updated 3 months ago
- Implementation of Perceiver AR, Deepmind's new long-context attention network based on Perceiver architecture, in Pytorch☆86Updated last year
- Accelerated First Order Parallel Associative Scan☆164Updated 3 months ago
- Why Do We Need Weight Decay in Modern Deep Learning? [NeurIPS 2024]☆52Updated last month
- Implementation of Gated State Spaces, from the paper "Long Range Language Modeling via Gated State Spaces", in Pytorch☆95Updated last year
- ☆46Updated last month
- Implementation of Hourglass Transformer, in Pytorch, from Google and OpenAI☆84Updated 2 years ago
- The 2D discrete wavelet transform for JAX☆38Updated last year
- A library for unit scaling in PyTorch☆105Updated 2 weeks ago
- Multidimensional indexing for tensors☆113Updated last year
- CUDA implementation of autoregressive linear attention, with all the latest research findings☆43Updated last year
- Scalable and Performant Data Loading☆68Updated this week
- Inspired by "Neural Networks Fail to Learn Periodic Functions and How to Fix It"☆59Updated 6 months ago
- Implementation of RQ Transformer, proposed in the paper "Autoregressive Image Generation using Residual Quantization"☆95Updated 2 years ago