devvrit / SONew
☆9Updated last year
Alternatives and similar repositories for SONew:
Users that are interested in SONew are comparing it to the libraries listed below
- ☆31Updated 10 months ago
- Blog post☆16Updated last year
- Layerwise Batch Entropy Regularization☆22Updated 2 years ago
- Codes for the paper The emergence of clusters in self-attention dynamics.☆14Updated last year
- Code Release for "Broken Neural Scaling Laws" (BNSL) paper☆58Updated last year
- The official code of "Building on Efficient Foundations: Effectively Training LLMs with Structured Feedforward Layers"☆17Updated 6 months ago
- ☆34Updated 2 months ago
- Codes accompanying the paper "LaProp: a Better Way to Combine Momentum with Adaptive Gradient"☆27Updated 4 years ago
- Source code for the paper "Positional Attention: Out-of-Distribution Generalization and Expressivity for Neural Algorithmic Reasoning"☆14Updated 3 weeks ago
- Curse-of-memory phenomenon of RNNs in sequence modelling☆19Updated this week
- ☆24Updated 2 years ago
- Code for Accelerated Linearized Laplace Approximation for Bayesian Deep Learning (ELLA, NeurIPS 22')☆16Updated 2 years ago
- Code for "Accelerating Training with Neuron Interaction and Nowcasting Networks" [to appear at ICLR 2025]☆18Updated last month
- ☆49Updated 7 months ago
- Efficient Scaling laws and collaborative pretraining.☆14Updated 3 weeks ago
- Official repository for the paper "Going Beyond Linear Transformers with Recurrent Fast Weight Programmers" (NeurIPS 2021)☆48Updated last year
- ☆52Updated 4 months ago
- ☆11Updated 2 years ago
- ☆24Updated 4 months ago
- ☆33Updated last year
- A repo based on XiLin Li's PSGD repo that extends some of the experiments.☆14Updated 4 months ago
- Efficient PScan implementation in PyTorch☆15Updated last year
- Jupyter Notebook corresponding to 'Going with the Flow: An Introduction to Normalizing Flows'☆25Updated 3 years ago
- [ICML'21 Oral] Improving Lossless Compression Rates via Monte Carlo Bits-Back Coding☆14Updated 3 years ago
- ☆15Updated last year
- Parallel Associative Scan for Language Models☆18Updated last year
- Latest Weight Averaging (NeurIPS HITY 2022)☆28Updated last year
- Euclidean Wasserstein-2 optimal transportation☆44Updated last year
- Transformers with doubly stochastic attention☆45Updated 2 years ago