rimads / avey-dpaLinks
Code for the paper Don't Pay Attention
β51Updated 4 months ago
Alternatives and similar repositories for avey-dpa
Users that are interested in avey-dpa are comparing it to the libraries listed below
Sorting:
- Explorations into the proposal from the paper "Grokfast, Accelerated Grokking by Amplifying Slow Gradients"β103Updated last year
- πSmall Batch Size Training for Language Modelsβ80Updated 3 months ago
- Collection of autoregressive model implementationβ85Updated 2 weeks ago
- β82Updated last year
- Train a SmolLM-style llm on fineweb-edu in JAX/Flax with an assortment of optimizers.β18Updated 6 months ago
- Tiny re-implementation of MDM in style of LLaDA and nano-gpt speedrunβ56Updated 10 months ago
- H-Net Dynamic Hierarchical Architectureβ81Updated 4 months ago
- Supporting code for the blog post on modular manifolds.β113Updated 4 months ago
- Minimal (400 LOC) implementation Maximum (multi-node, FSDP) GPT trainingβ132Updated last year
- β70Updated last year
- Pytorch implementation of the PEER block from the paper, Mixture of A Million Experts, by Xu Owen He at Deepmindβ133Updated 2 months ago
- https://x.com/BlinkDL_AI/status/1884768989743882276β28Updated 8 months ago
- β91Updated last year
- Tiled Flash Linear Attention library for fast and efficient mLSTM Kernels.β82Updated 2 months ago
- A State-Space Model with Rational Transfer Function Representation.β83Updated last year
- A byte-level decoder architecture that matches the performance of tokenized Transformers.β67Updated last year
- NanoGPT-speedrunning for the poor T4 enjoyersβ73Updated 9 months ago
- Implementation of Gradient Agreement Filtering, from Chaubard et al. of Stanford, but for single machine microbatches, in Pytorchβ25Updated last year
- Universal Reasoning Modelβ121Updated 2 weeks ago
- Efficiently discovering algorithms via LLMs with evolutionary search and reinforcement learning.β125Updated 2 months ago
- The simplest, fastest repository for training/finetuning medium-sized GPTs.β186Updated last week
- an open source reproduction of NVIDIA's nGPT (Normalized Transformer with Representation Learning on the Hypersphere)β109Updated 10 months ago
- Lightweight package that tracks and summarizes code changes using LLMs (Large Language Models)β34Updated 11 months ago
- β19Updated last month
- Extending the Context of Pretrained LLMs by Dropping Their Positional Embeddingβ193Updated 2 weeks ago
- EvaByte: Efficient Byte-level Language Models at Scaleβ115Updated 9 months ago
- β35Updated last year
- Ο-GPT: A New Approach to Autoregressive Modelsβ70Updated last year
- β109Updated 6 months ago
- β62Updated last year