m-a-n-i-f-e-s-t / power-attentionLinks

Attention Kernels for Symmetric Power Transformers

☆128

Alternatives and similar repositories for power-attention

Users that are interested in power-attention are comparing it to the libraries listed below

Sorting:

athms / mad-lab
A MAD laboratory to improve AI architecture designs 🧪
☆135Updated 11 months ago
bloc97 / DeMo
DeMo: Decoupled Momentum Optimization
☆197Updated 11 months ago
LucasPrietoAl / grokking-at-the-edge-of-numerical-stability
☆105Updated 4 months ago
main-horse / hnet-old
H-Net Dynamic Hierarchical Architecture
☆80Updated 2 months ago
Zyphra / tree_attention
Tree Attention: Topology-aware Decoding for Long-Context Attention on GPU clusters
☆130Updated 11 months ago
EleutherAI / nanoGPT-mup
The simplest, fastest repository for training/finetuning medium-sized GPTs.
☆173Updated 5 months ago
MatX-inc / seqax
seqax = sequence modeling + JAX
☆168Updated 4 months ago
proger / accelerated-scan
Accelerated First Order Parallel Associative Scan
☆192Updated last year
ethansmith2000 / fsdp_optimizers
supporting pytorch FSDP for optimizers
☆84Updated 11 months ago
apple / ml-ademamix
☆68Updated last year
Zyphra / Zamba2
PyTorch implementation of models from the Zamba2 series.
☆185Updated 10 months ago
HazyResearch / zoology
Understand and test language model architectures on synthetic tasks.
☆240Updated 2 months ago
NVIDIA / ngpt
Normalized Transformer (nGPT)
☆194Updated last year
OpenEvaByte / evabyte
EvaByte: Efficient Byte-level Language Models at Scale
☆111Updated 7 months ago
JoeLi12345 / nGPT
an open source reproduction of NVIDIA's nGPT (Normalized Transformer with Representation Learning on the Hypersphere)
☆108Updated 8 months ago
cloneofsimo / min-max-gpt
Minimal (400 LOC) implementation Maximum (multi-node, FSDP) GPT training
☆132Updated last year
martin-marek / batch-size
📄Small Batch Size Training for Language Models
☆63Updated last month
HazyResearch / lolcats
Repo for "LoLCATs: On Low-Rank Linearizing of Large Language Models"
☆249Updated 10 months ago
microsoft / ArchScale
Simple & Scalable Pretraining for Neural Architecture Research
☆302Updated last month
berlino / seq_icl
☆53Updated last year
modula-systems / modula
🧱 Modula software package
☆307Updated 3 months ago
thinking-machines-lab / manifolds
Supporting code for the blog post on modular manifolds.
☆103Updated 2 months ago
iliao2345 / CompressARC
☆201Updated 3 months ago
facebookresearch / llm-speedrunner
The Automated LLM Speedrunning Benchmark measures how well LLM agents can reproduce previous innovations and discover new ones in languag…
☆112Updated last month
NX-AI / mlstm_kernels
Tiled Flash Linear Attention library for fast and efficient mLSTM Kernels.
☆74Updated 2 weeks ago
HazyResearch / based
Code for exploring Based models from "Simple linear attention language models balance the recall-throughput tradeoff"
☆243Updated 5 months ago
cloneofsimo / min-fsdp
☆91Updated last year
yixiaoer / tpux
A set of Python scripts that makes your experience on TPU better
☆54Updated 2 months ago
epfml / DenseFormer
☆82Updated last year
jax-ml / jax-llm-examples
Minimal yet performant LLM examples in pure JAX
☆202Updated 2 months ago