m-a-n-i-f-e-s-t / power-attentionLinks
Attention Kernels for Symmetric Power Transformers
☆53Updated 4 months ago
Alternatives and similar repositories for power-attention
Users that are interested in power-attention are comparing it to the libraries listed below
Sorting:
- Learning Universal Predictors☆74Updated 10 months ago
- ☆38Updated 10 months ago
- look how they massacred my boy☆63Updated 7 months ago
- A MAD laboratory to improve AI architecture designs 🧪☆116Updated 5 months ago
- ☆33Updated 3 months ago
- ☆95Updated 4 months ago
- EvaByte: Efficient Byte-level Language Models at Scale☆101Updated last month
- ☆44Updated last year
- Repository for the paper Stream of Search: Learning to Search in Language☆147Updated 4 months ago
- Plotting (entropy, varentropy) for small LMs☆97Updated 2 weeks ago
- ☆154Updated last month
- ☆27Updated 10 months ago
- Tree Attention: Topology-aware Decoding for Long-Context Attention on GPU clusters☆126Updated 6 months ago
- ☆111Updated 5 months ago
- A reading list of relevant papers and projects on foundation model annotation☆27Updated 3 months ago
- Minimal but scalable implementation of large language models in JAX☆34Updated 7 months ago
- train with kittens!☆57Updated 7 months ago
- ☆31Updated 2 months ago
- PCCL (Prime Collective Communications Library) implements fault tolerant collective communications over IP☆88Updated 2 weeks ago
- ☆53Updated last year
- seqax = sequence modeling + JAX☆155Updated last month
- ☆51Updated last year
- Train your own SOTA deductive reasoning model☆93Updated 3 months ago
- smolLM with Entropix sampler on pytorch☆150Updated 7 months ago
- ☆118Updated 2 weeks ago
- ☆76Updated last month
- The history files when recording human interaction while solving ARC tasks☆110Updated 2 weeks ago
- Implementing RASP transformer programming language https://arxiv.org/pdf/2106.06981.pdf.☆54Updated 3 years ago
- Harmonic Datasets☆40Updated 10 months ago
- an open source reproduction of NVIDIA's nGPT (Normalized Transformer with Representation Learning on the Hypersphere)☆100Updated 3 months ago