m-a-n-i-f-e-s-t / power-attentionLinks
Attention Kernels for Symmetric Power Transformers
β115Updated 3 weeks ago
Alternatives and similar repositories for power-attention
Users that are interested in power-attention are comparing it to the libraries listed below
Sorting:
- β102Updated 2 months ago
- A MAD laboratory to improve AI architecture designs π§ͺβ129Updated 9 months ago
- seqax = sequence modeling + JAXβ167Updated 2 months ago
- A JAX-native LLM Post-Training Libraryβ150Updated this week
- DeMo: Decoupled Momentum Optimizationβ191Updated 9 months ago
- supporting pytorch FSDP for optimizersβ84Updated 9 months ago
- Accelerated First Order Parallel Associative Scanβ188Updated last year
- πSmall Batch Size Training for Language Modelsβ62Updated 3 weeks ago
- The simplest, fastest repository for training/finetuning medium-sized GPTs.β162Updated 3 months ago
- Understand and test language model architectures on synthetic tasks.β226Updated this week
- Minimal yet performant LLM examples in pure JAXβ160Updated this week
- β53Updated last year
- β67Updated 10 months ago
- Normalized Transformer (nGPT)β190Updated 10 months ago
- β187Updated last month
- H-Net Dynamic Hierarchical Architectureβ79Updated 2 weeks ago
- A set of Python scripts that makes your experience on TPU betterβ54Updated last week
- π§± Modula software packageβ239Updated last month
- β58Updated 11 months ago
- Minimal (400 LOC) implementation Maximum (multi-node, FSDP) GPT trainingβ132Updated last year
- Tree Attention: Topology-aware Decoding for Long-Context Attention on GPU clustersβ129Updated 9 months ago
- β53Updated last year
- Experiment of using Tangent to autodiff tritonβ81Updated last year
- β40Updated last year
- an open source reproduction of NVIDIA's nGPT (Normalized Transformer with Representation Learning on the Hypersphere)β105Updated 6 months ago
- A FlashAttention implementation for JAX with support for efficient document mask computation and context parallelism.β141Updated 5 months ago
- EvaByte: Efficient Byte-level Language Models at Scaleβ109Updated 5 months ago
- β281Updated last year
- A simple library for scaling up JAX programsβ143Updated 10 months ago
- Minimal but scalable implementation of large language models in JAXβ35Updated 3 weeks ago