pbelcak / fastfeedforward
A repository for log-time feedforward networks
☆220Updated 11 months ago
Alternatives and similar repositories for fastfeedforward:
Users that are interested in fastfeedforward are comparing it to the libraries listed below
- The AdEMAMix Optimizer: Better, Faster, Older.☆178Updated 6 months ago
- The repository for the code of the UltraFastBERT paper☆517Updated 11 months ago
- ☆94Updated 9 months ago
- Accelerated First Order Parallel Associative Scan☆174Updated 6 months ago
- supporting pytorch FSDP for optimizers☆79Updated 3 months ago
- Code for exploring Based models from "Simple linear attention language models balance the recall-throughput tradeoff"☆223Updated 3 weeks ago
- FlashFFTConv: Efficient Convolutions for Long Sequences with Tensor Cores☆304Updated 2 months ago
- Understand and test language model architectures on synthetic tasks.☆183Updated last week
- Efficient optimizers☆181Updated this week
- Some preliminary explorations of Mamba's context scaling.☆213Updated last year
- Code repository for Black Mamba☆240Updated last year
- Annotated version of the Mamba paper☆474Updated last year
- Fast, Modern, Memory Efficient, and Low Precision PyTorch Optimizers☆86Updated 7 months ago
- ☆301Updated 8 months ago
- Scalable and Performant Data Loading☆224Updated this week
- Tree Attention: Topology-aware Decoding for Long-Context Attention on GPU clusters☆123Updated 3 months ago
- WIP☆93Updated 7 months ago
- The simplest, fastest repository for training/finetuning medium-sized GPTs.☆100Updated 3 months ago
- Implementation of the conditionally routed attention in the CoLT5 architecture, in Pytorch☆226Updated 6 months ago
- Implementation of the proposed Adam-atan2 from Google Deepmind in Pytorch☆101Updated 3 months ago
- Implementation of the Llama architecture with RLHF + Q-learning☆163Updated last month
- Pytorch implementation of the PEER block from the paper, Mixture of A Million Experts, by Xu Owen He at Deepmind☆121Updated 6 months ago
- Repo for "LoLCATs: On Low-Rank Linearizing of Large Language Models"☆222Updated last month
- Implementation of 💍 Ring Attention, from Liu et al. at Berkeley AI, in Pytorch☆506Updated 4 months ago
- A MAD laboratory to improve AI architecture designs 🧪☆107Updated 2 months ago
- CIFAR-10 speedruns: 94% in 2.6 seconds and 96% in 27 seconds☆216Updated last week
- Code for the paper "QMoE: Practical Sub-1-Bit Compression of Trillion-Parameter Models".☆270Updated last year
- ☆53Updated last year
- Effortless plugin and play Optimizer to cut model training costs by 50%. New optimizer that is 2x faster than Adam on LLMs.☆381Updated 9 months ago
- Collection of autoregressive model implementation☆83Updated last month