SamsungSAILMontreal / ninoLinks
Code for "Accelerating Training with Neuron Interaction and Nowcasting Networks" [ICLR 2025]
☆24Updated 3 weeks ago
Alternatives and similar repositories for nino
Users that are interested in nino are comparing it to the libraries listed below
Sorting:
- ☆81Updated last year
- ☆86Updated last year
- ☆13Updated 8 months ago
- ☆46Updated last year
- ☆60Updated last year
- Official repository for the paper "Approximating Two-Layer Feedforward Networks for Efficient Transformers"☆38Updated 4 months ago
- Official implementation of Regularized Policy Gradient (RPG) (https://arxiv.org/abs/2505.17508)☆53Updated 2 weeks ago
- The official repository for HyperZ⋅Z⋅W Operator Connects Slow-Fast Networks for Full Context Interaction.☆40Updated 6 months ago
- Pytorch implementation of the PEER block from the paper, Mixture of A Million Experts, by Xu Owen He at Deepmind☆129Updated last year
- Unofficial Implementation of Selective Attention Transformer☆17Updated last year
- σ-GPT: A New Approach to Autoregressive Models☆68Updated last year
- A repository for research on medium sized language models.☆78Updated last year
- Code for the paper "Function-Space Learning Rates"☆23Updated 4 months ago
- A single repo with all scripts and utils to train / fine-tune the Mamba model with or without FIM☆59Updated last year
- Code and pretrained models for the paper: "MatMamba: A Matryoshka State Space Model"☆61Updated 11 months ago
- Triton Implementation of HyperAttention Algorithm☆48Updated last year
- https://x.com/BlinkDL_AI/status/1884768989743882276☆28Updated 5 months ago
- Efficiently discovering algorithms via LLMs with evolutionary search and reinforcement learning.☆116Updated last week
- The official github repo for "Diffusion Language Models are Super Data Learners".☆139Updated last week
- Fork of Flame repo for training of some new stuff in development☆18Updated 3 weeks ago
- Explorations into the proposal from the paper "Grokfast, Accelerated Grokking by Amplifying Slow Gradients"☆103Updated 10 months ago
- ☆33Updated last year
- One Initialization to Rule them All: Fine-tuning via Explained Variance Adaptation☆44Updated 2 weeks ago
- ☆29Updated 4 months ago
- Lottery Ticket Adaptation☆40Updated 11 months ago
- Implementation of Gradient Agreement Filtering, from Chaubard et al. of Stanford, but for single machine microbatches, in Pytorch☆25Updated 9 months ago
- ☆33Updated 9 months ago
- ☆34Updated 7 months ago
- Deep Networks Grok All the Time and Here is Why☆37Updated last year
- Train a SmolLM-style llm on fineweb-edu in JAX/Flax with an assortment of optimizers.☆18Updated 3 months ago