devvrit / SONewLinks
☆9Updated last year
Alternatives and similar repositories for SONew
Users that are interested in SONew are comparing it to the libraries listed below
Sorting:
- Unofficial but Efficient Implementation of "Mamba: Linear-Time Sequence Modeling with Selective State Spaces" in JAX☆84Updated last year
- Blog post☆17Updated last year
- Parallel Associative Scan for Language Models☆18Updated last year
- Implementation of Token Shift GPT - An autoregressive model that solely relies on shifting the sequence space for mixing☆50Updated 3 years ago
- ☆32Updated last year
- Official repository for the paper "Going Beyond Linear Transformers with Recurrent Fast Weight Programmers" (NeurIPS 2021)☆49Updated last month
- Code Release for "Broken Neural Scaling Laws" (BNSL) paper☆59Updated last year
- Codes accompanying the paper "LaProp: a Better Way to Combine Momentum with Adaptive Gradient"☆29Updated 4 years ago
- Layerwise Batch Entropy Regularization☆23Updated 2 years ago
- Combining SOAP and MUON☆16Updated 5 months ago
- ☆31Updated 7 months ago
- Expanding linear RNN state-transition matrix eigenvalues to include negatives improves state-tracking tasks and language modeling without…☆15Updated 4 months ago
- JAX/Flax implementation of the Hyena Hierarchy☆34Updated 2 years ago
- ☆31Updated last month
- Using FlexAttention to compute attention with different masking patterns☆44Updated 9 months ago
- ☆53Updated 9 months ago
- Source-to-Source Debuggable Derivatives in Pure Python☆15Updated last year
- ☆36Updated last year
- [ICML'21 Oral] Improving Lossless Compression Rates via Monte Carlo Bits-Back Coding☆14Updated 4 years ago
- ☆56Updated last year
- Triton Implementation of HyperAttention Algorithm☆48Updated last year
- ☆31Updated 2 years ago
- Code for "Accelerating Training with Neuron Interaction and Nowcasting Networks" [to appear at ICLR 2025]☆19Updated last month
- PyTorch implementation for "Long Horizon Temperature Scaling", ICML 2023☆20Updated 2 years ago
- ☆13Updated last month
- JAX implementation of Learning to learn by gradient descent by gradient descent☆27Updated 9 months ago
- PyTorch implementation for "Probabilistic Circuits for Variational Inference in Discrete Graphical Models", NeurIPS 2020☆17Updated 3 years ago
- Fine-Tuning Pre-trained Transformers into Decaying Fast Weights☆19Updated 2 years ago
- An implementation of Transformer with Expire-Span, a circuit for learning which memories to retain☆34Updated 4 years ago
- The official repository for our paper "The Neural Data Router: Adaptive Control Flow in Transformers Improves Systematic Generalization".☆33Updated last month