JacksonWuxs / Forward-Forward-NetworkLinks
Implementation of Forward Forward Network proposed by Hinton in NIPS 2022.
☆169Updated 2 years ago
Alternatives and similar repositories for Forward-Forward-Network
Users that are interested in Forward-Forward-Network are comparing it to the libraries listed below
Sorting:
- Reimplementation of Geoffrey Hinton's Forward-Forward Algorithm☆149Updated last year
- ☆191Updated last year
- PyTorch implementation for Vision Transformer[Dosovitskiy, A.(ICLR'21)] modified to obtain over 90% accuracy FROM SCRATCH on CIFAR-10 wit…☆197Updated last year
- Implementation of Block Recurrent Transformer - Pytorch☆219Updated 10 months ago
- [EMNLP 2022] Official implementation of Transnormer in our EMNLP 2022 paper - The Devil in Linear Transformer☆60Updated last year
- ☆63Updated 3 years ago
- ☆208Updated 2 years ago
- Implementation of Soft MoE, proposed by Brain's Vision team, in Pytorch☆299Updated 2 months ago
- Implementation of Infini-Transformer in Pytorch☆111Updated 5 months ago
- ☆33Updated 4 years ago
- ☆147Updated 2 years ago
- ☆131Updated last year
- [ICLR 2025] Official PyTorch implementation of "Forgetting Transformer: Softmax Attention with a Forget Gate"☆108Updated last month
- [ICLR'24] "DeepZero: Scaling up Zeroth-Order Optimization for Deep Model Training" by Aochuan Chen*, Yimeng Zhang*, Jinghan Jia, James Di…☆59Updated 8 months ago
- Parallelizing non-linear sequential models over the sequence length☆52Updated 5 months ago
- Some preliminary explorations of Mamba's context scaling.☆214Updated last year
- Inference Speed Benchmark for Learning to (Learn at Test Time): RNNs with Expressive Hidden States☆67Updated 11 months ago
- Sequence modeling with Mega.☆296Updated 2 years ago
- Implementation of Mega, the Single-head Attention with Multi-headed EMA architecture that currently holds SOTA on Long Range Arena☆204Updated last year
- PyTorch implementation of Mixer-nano (#parameters is 0.67M, originally Mixer-S/16 has 18M) with 90.83 % acc. on CIFAR-10. Training from s…☆32Updated 3 years ago
- Code for the paper: Why Transformers Need Adam: A Hessian Perspective☆59Updated 3 months ago
- ☆105Updated last year
- Official PyTorch Implementation for Fast Adaptive Multitask Optimization (FAMO)☆90Updated last year
- [EVA ICLR'23; LARA ICML'22] Efficient attention mechanisms via control variates, random features, and importance sampling☆86Updated 2 years ago
- Some personal experiments around routing tokens to different autoregressive attention, akin to mixture-of-experts☆119Updated 8 months ago
- Crawl & Visualize ICLR 2023 Data from OpenReview☆84Updated 2 years ago
- Sequence Modeling with Multiresolution Convolutional Memory (ICML 2023)☆124Updated last year
- Implementation of Recurrent Memory Transformer, Neurips 2022 paper, in Pytorch☆409Updated 5 months ago
- ☆46Updated 2 years ago
- Official code repository of the paper Linear Transformers Are Secretly Fast Weight Programmers.☆105Updated 4 years ago