Graph-ZKY / CaFo
A pytorch implementation of Cascaded Forward (CaFo) Algorithm
☆21Updated last year
Related projects: ⓘ
- Implementation of Forward Forward Network proposed by Hinton in NIPS 2022.☆161Updated last year
- Code accompanying the paper "Massive Activations in Large Language Models"☆104Updated 6 months ago
- PyTorch implementation of "From Sparse to Soft Mixtures of Experts"☆38Updated last year
- Understand and test language model architectures on synthetic tasks.☆156Updated 4 months ago
- AutoPEFT: Automatic Configuration Search for Parameter-Efficient Fine-Tuning (Zhou et al.; TACL)☆42Updated 6 months ago
- Reimplementation of Geoffrey Hinton's Forward-Forward Algorithm☆117Updated 10 months ago
- Official PyTorch Implementation of "The Hidden Attention of Mamba Models"☆186Updated 3 months ago
- HGRN2: Gated Linear RNNs with State Expansion☆46Updated last month
- Code for exploring Based models from "Simple linear attention language models balance the recall-throughput tradeoff"☆206Updated last month
- Implementation/simulation of the predictive forward-forward credit assignment algorithm for training neurobiologically-plausible recurren…☆54Updated last year
- MambaFormer in-context learning experiments and implementation for https://arxiv.org/abs/2402.04248☆33Updated 3 months ago
- ☆23Updated 4 months ago
- Official implementation of Phi-Mamba. A MOHAWK-distilled model (Transformers to SSMs: Distilling Quadratic Knowledge to Subquadratic Mode…☆61Updated last week
- A curated list of Model Merging methods.☆71Updated this week
- Pytorch implementation of the PEER block from the paper, Mixture of A Million Experts, by Xu Owen He at Deepmind☆105Updated 3 weeks ago
- 94% on CIFAR-10 in 3.09 seconds 💨 96% in 27 seconds☆127Updated last month
- Revisiting Efficient Training Algorithms For Transformer-based Language Models (NeurIPS 2023)☆77Updated last year
- Official implementation of "Hydra: Bidirectional State Space Models Through Generalized Matrix Mixers"☆94Updated last month
- Implementation of MambaByte in "MambaByte: Token-free Selective State Space Model" in Pytorch and Zeta☆103Updated last week
- ☆42Updated 3 months ago
- PyTorch implementation of Hinton's FF Algorithm with hard negatives sampling☆14Updated last year
- Implementation of Soft MoE, proposed by Brain's Vision team, in Pytorch☆233Updated 4 months ago
- Parallelizing non-linear sequential models over the sequence length☆40Updated last month
- Official implementation for Equivariant Architectures for Learning in Deep Weight Spaces [ICML 2023]☆81Updated last year
- ☆19Updated 4 months ago
- ☆94Updated 6 months ago
- Implementation of Infini-Transformer in Pytorch☆100Updated last month
- A MAD laboratory to improve AI architecture designs 🧪☆84Updated 4 months ago
- Code for "Can We Scale Transformers to Predict Parameters of Diverse ImageNet Models?" [ICML 2023]☆30Updated 3 weeks ago
- [ICLR 2024 Spotlight] Code for the paper "Merge, Then Compress: Demystify Efficient SMoE with Hints from Its Routing Policy"☆63Updated 3 months ago