devvrit / SONew
☆9Updated last year
Alternatives and similar repositories for SONew:
Users that are interested in SONew are comparing it to the libraries listed below
- Blog post☆17Updated last year
- PyTorch implementation for "Long Horizon Temperature Scaling", ICML 2023☆20Updated last year
- ☆34Updated 3 months ago
- Efficient Scaling laws and collaborative pretraining.☆16Updated 2 months ago
- ☆18Updated 8 months ago
- [ICML 2024] SIRFShampoo: Structured inverse- and root-free Shampoo in PyTorch (https://arxiv.org/abs/2402.03496)☆14Updated 4 months ago
- ☆31Updated 11 months ago
- ☆24Updated 2 years ago
- Code for "Accelerating Training with Neuron Interaction and Nowcasting Networks" [to appear at ICLR 2025]☆18Updated 3 weeks ago
- Compositional Capabilities of Autoregressive Transformers: A Study on Synthetic, Interpretable Tasks☆10Updated 9 months ago
- Code Release for "Broken Neural Scaling Laws" (BNSL) paper☆58Updated last year
- Code for Accelerated Linearized Laplace Approximation for Bayesian Deep Learning (ELLA, NeurIPS 22')☆16Updated 2 years ago
- ☆30Updated 5 months ago
- Layerwise Batch Entropy Regularization☆22Updated 2 years ago
- ☆30Updated 4 months ago
- Codes accompanying the paper "LaProp: a Better Way to Combine Momentum with Adaptive Gradient"☆28Updated 4 years ago
- Transformers with doubly stochastic attention☆45Updated 2 years ago
- Official repository for the paper "Going Beyond Linear Transformers with Recurrent Fast Weight Programmers" (NeurIPS 2021)☆48Updated 2 years ago
- [ICML'21 Oral] Improving Lossless Compression Rates via Monte Carlo Bits-Back Coding☆14Updated 3 years ago
- Implementation of the models and datasets used in "An Information-theoretic Approach to Distribution Shifts"☆25Updated 3 years ago
- ☆52Updated 5 months ago
- Curse-of-memory phenomenon of RNNs in sequence modelling☆19Updated last week
- Code for Neural Execution Engines: Learning to Execute Subroutines☆17Updated 4 years ago
- Official repository for the paper "Approximating Two-Layer Feedforward Networks for Efficient Transformers"☆36Updated last year
- Parallel Associative Scan for Language Models☆18Updated last year
- ☆33Updated last year
- ☆23Updated 6 months ago
- Efficient PScan implementation in PyTorch☆16Updated last year
- Experiments on GPT-3's ability to fit numerical models in-context.☆14Updated 2 years ago
- Code for Augment & Reduce, a scalable stochastic algorithm for large categorical distributions☆10Updated 6 years ago