a-r-r-o-w / kanformer
Naively combining transformers and Kolmogorov-Arnold Networks to learn and experiment
☆35Updated 6 months ago
Alternatives and similar repositories for kanformer:
Users that are interested in kanformer are comparing it to the libraries listed below
- First-principle implementations of groundbreaking AI algorithms using a wide range of deep learning frameworks, accompanied by supporting…☆122Updated last week
- ☆80Updated 7 months ago
- Implementation of Agent Attention in Pytorch☆89Updated 6 months ago
- Kolmogorov–Arnold Networks with modified activation (using MLP to represent the activation)☆102Updated 3 months ago
- ☆122Updated 8 months ago
- PyTorch Implementation of Jamba: "Jamba: A Hybrid Transformer-Mamba Language Model"☆155Updated this week
- Variations of Kolmogorov-Arnold Networks☆112Updated 8 months ago
- Trying out the Mamba architecture on small examples (cifar-10, shakespeare char level etc.)☆43Updated last year
- Integrating Mamba/SSMs with Transformer for Enhanced Long Context and High-Quality Sequence Modeling☆182Updated this week
- my attempts at implementing various bits of Sepp Hochreiter's new xLSTM architecture☆129Updated 8 months ago
- A single repo with all scripts and utils to train / fine-tune the Mamba model with or without FIM☆50Updated 9 months ago
- This repository contains a better implementation of Kolmogorov-Arnold networks☆59Updated 8 months ago
- KAN for Vision Transformer☆240Updated 3 months ago
- Transformer model based on Kolmogorov–Arnold Network(KAN), which is an alternative of Multi-Layer Perceptron(MLP)☆26Updated 2 months ago
- A modified CNN architecture using Kolmogorov-Arnold Networks☆70Updated 8 months ago
- Benchmarking and Testing FastKAN☆70Updated 8 months ago
- Training small GPT-2 style models using Kolmogorov-Arnold networks.☆113Updated 8 months ago
- Pytorch (Lightning) implementation of the Mamba model☆23Updated 9 months ago
- PyTorch implementation of the Differential-Transformer architecture for sequence modeling, specifically tailored as a decoder-only model …☆51Updated 3 months ago
- Unofficial Implementation of Selective Attention Transformer☆14Updated 3 months ago
- Pytorch Implementation of the paper: "Learning to (Learn at Test Time): RNNs with Expressive Hidden States"☆24Updated this week
- Attempt to make multiple residual streams from Bytedance's Hyper-Connections paper accessible to the public☆66Updated last week
- Implementation of Infini-Transformer in Pytorch☆109Updated 3 weeks ago
- Implementation of xLSTM in Pytorch from the paper: "xLSTM: Extended Long Short-Term Memory"☆115Updated this week
- The official repository for HyperZ⋅Z⋅W Operator Connects Slow-Fast Networks for Full Context Interaction.☆31Updated 2 weeks ago
- Collection of autoregressive model implementation☆77Updated 3 weeks ago
- Official PyTorch Implementation of "The Hidden Attention of Mamba Models"☆211Updated 8 months ago
- Awesome list of papers that extend Mamba to various applications.☆129Updated last month
- Implementation of Griffin from the paper: "Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models"☆51Updated this week
- An easy to use PyTorch implementation of the Kolmogorov Arnold Network and a few novel variations☆169Updated 2 months ago