AdityaNG / kan-gpt
The PyTorch implementation of Generative Pre-trained Transformers (GPTs) using Kolmogorov-Arnold Networks (KANs) for language modeling
☆703Updated 2 months ago
Related projects ⓘ
Alternatives and complementary repositories for kan-gpt
- Kolmogorov-Arnold Networks (KAN) using Chebyshev polynomials instead of B-splines.☆347Updated 6 months ago
- FastKAN: Very Fast Implementation of Kolmogorov-Arnold Networks (KAN)☆367Updated 5 months ago
- Official repository of the xLSTM.☆1,407Updated 2 weeks ago
- ☆713Updated 5 months ago
- Schedule-Free Optimization in PyTorch☆1,898Updated 2 weeks ago
- Training small GPT-2 style models using Kolmogorov-Arnold networks.☆108Updated 5 months ago
- A comprehensive collection of KAN(Kolmogorov-Arnold Network)-related resources, including libraries, projects, tutorials, papers, and mor…☆2,578Updated 2 weeks ago
- Official implementation of "Samba: Simple Hybrid State Space Models for Efficient Unlimited Context Language Modeling"☆803Updated 3 months ago
- This project extends the idea of the innovative architecture of Kolmogorov-Arnold Networks (KAN) to the Convolutional Layers, changing th…☆781Updated 2 weeks ago
- KAN for Vision Transformer☆232Updated last month
- An easy to use PyTorch implementation of the Kolmogorov Arnold Network and a few novel variations☆159Updated 3 months ago
- nanoGPT style version of Llama 3.1☆1,246Updated 3 months ago
- NanoGPT (124M) quality in 7.8 8xH100-minutes☆1,033Updated this week
- Official repository for the paper "Grokfast: Accelerated Grokking by Amplifying Slow Gradients"☆515Updated 4 months ago
- Understanding Kolmogorov-Arnold Networks: A Tutorial Series on KAN using Toy Examples☆165Updated last month
- The Multilayer Perceptron Language Model☆523Updated 3 months ago
- Variations of Kolmogorov-Arnold Networks☆111Updated 6 months ago
- DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models☆835Updated 7 months ago
- A native PyTorch Library for large model training☆2,623Updated this week
- Resources about xLSTM by Sepp Hochreiter☆295Updated last week
- UNet diffusion model in pure CUDA☆584Updated 4 months ago
- System 2 Reasoning Link Collection☆693Updated 3 weeks ago
- Open weights language model from Google DeepMind, based on Griffin.☆607Updated 4 months ago
- The best repository showing why transformers might not be the answer for time series forecasting and showcasing the best SOTA non transfo…☆520Updated last week
- Best practices & guides on how to write distributed pytorch training code☆286Updated 2 weeks ago
- PyTorch implementation of Infini-Transformer from "Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention…☆280Updated 6 months ago
- Official PyTorch implementation of Learning to (Learn at Test Time): RNNs with Expressive Hidden States☆1,040Updated 4 months ago
- The Autograd Engine☆534Updated 2 months ago
- The Tensor (or Array)☆411Updated 3 months ago
- Reaching LLaMA2 Performance with 0.1M Dollars☆960Updated 3 months ago