a-r-r-o-w / kanformer
Naively combining transformers and Kolmogorov-Arnold Networks to learn and experiment
☆35Updated 7 months ago
Alternatives and similar repositories for kanformer:
Users that are interested in kanformer are comparing it to the libraries listed below
- A modified CNN architecture using Kolmogorov-Arnold Networks☆70Updated 9 months ago
- PyTorch Implementation of Jamba: "Jamba: A Hybrid Transformer-Mamba Language Model"☆158Updated last month
- Training small GPT-2 style models using Kolmogorov-Arnold networks.☆114Updated 9 months ago
- my attempts at implementing various bits of Sepp Hochreiter's new xLSTM architecture☆129Updated 9 months ago
- Variations of Kolmogorov-Arnold Networks☆113Updated 9 months ago
- ☆127Updated 9 months ago
- Integrating Mamba/SSMs with Transformer for Enhanced Long Context and High-Quality Sequence Modeling☆187Updated last month
- Benchmarking and Testing FastKAN☆71Updated 9 months ago
- Trying out the Mamba architecture on small examples (cifar-10, shakespeare char level etc.)☆44Updated last year
- This repository contains a better implementation of Kolmogorov-Arnold networks☆61Updated 9 months ago
- A single repo with all scripts and utils to train / fine-tune the Mamba model with or without FIM☆52Updated 10 months ago
- KAN for Vision Transformer☆243Updated 4 months ago
- Simple, minimal implementation of the Mamba SSM in one pytorch file. Using logcumsumexp (Heisen sequence).☆112Updated 4 months ago
- Implementation of Agent Attention in Pytorch☆90Updated 7 months ago
- First-principle implementations of groundbreaking AI algorithms using a wide range of deep learning frameworks, accompanied by supporting…☆139Updated this week
- Benchmark for efficiency in memory and time of different KAN implementations.☆118Updated 6 months ago
- An easy to use PyTorch implementation of the Kolmogorov Arnold Network and a few novel variations☆174Updated 3 months ago
- Kolmogorov-Arnold Networks (KAN) using Jacobi polynomials instead of B-splines.☆36Updated 9 months ago
- ☆85Updated 8 months ago
- Kolmogorov-Arnold Networks (KAN) using Chebyshev polynomials instead of B-splines.☆364Updated 9 months ago
- Implementation of Griffin from the paper: "Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models"☆51Updated last month
- Collection of tests performed during the study of the new Kolmogorov-Arnold Neural Networks (KAN)☆36Updated this week
- Kolmogorov–Arnold Networks with modified activation (using MLP to represent the activation)☆103Updated 4 months ago
- ☆15Updated 4 months ago
- Implementation of xLSTM in Pytorch from the paper: "xLSTM: Extended Long Short-Term Memory"☆118Updated last month
- Transformer model based on Kolmogorov–Arnold Network(KAN), which is an alternative of Multi-Layer Perceptron(MLP)☆27Updated this week
- working implimention of deepseek MLA☆35Updated last month
- A Triton Kernel for incorporating Bi-Directionality in Mamba2☆61Updated 2 months ago