shikaiqiu / compute-better-spent
β46Updated last month
Related projects β
Alternatives and complementary repositories for compute-better-spent
- A MAD laboratory to improve AI architecture designs π§ͺβ95Updated 6 months ago
- β50Updated 6 months ago
- β53Updated 10 months ago
- β128Updated this week
- β29Updated 2 months ago
- Unofficial but Efficient Implementation of "Mamba: Linear-Time Sequence Modeling with Selective State Spaces" in JAXβ79Updated 9 months ago
- Why Do We Need Weight Decay in Modern Deep Learning? [NeurIPS 2024]β52Updated last month
- β20Updated 11 months ago
- β53Updated 3 weeks ago
- A State-Space Model with Rational Transfer Function Representation.β70Updated 6 months ago
- An annotated implementation of the Hyena Hierarchy paperβ31Updated last year
- Yet another random morning idea to be quickly tried and architecture shared if it works; to allow the transformer to pause for any amountβ¦β49Updated last year
- β25Updated last month
- β35Updated 7 months ago
- This repository includes code to reproduce the tables in "Loss Landscapes are All You Need: Neural Network Generalization Can Be Explaineβ¦β34Updated last year
- Triton Implementation of HyperAttention Algorithmβ46Updated 11 months ago
- β51Updated 5 months ago
- NanoGPT-like codebase for LLM trainingβ75Updated this week
- Efficient PScan implementation in PyTorchβ15Updated 10 months ago
- β36Updated 10 months ago
- β27Updated 7 months ago
- β76Updated 7 months ago
- Parallelizing non-linear sequential models over the sequence lengthβ45Updated 3 weeks ago
- β39Updated 10 months ago
- β31Updated 2 months ago
- Griffin MQA + Hawk Linear RNN Hybridβ85Updated 6 months ago
- Blog postβ16Updated 9 months ago
- β45Updated 9 months ago
- Sequence Modeling with Multiresolution Convolutional Memory (ICML 2023)β120Updated last year
- Normalized Transformer (nGPT)β66Updated this week