AdityaNG / kan-gpt
The PyTorch implementation of Generative Pre-trained Transformers (GPTs) using Kolmogorov-Arnold Networks (KANs) for language modeling
☆711Updated last month
Alternatives and similar repositories for kan-gpt:
Users that are interested in kan-gpt are comparing it to the libraries listed below
- Kolmogorov-Arnold Networks (KAN) using Chebyshev polynomials instead of B-splines.☆357Updated 8 months ago
- ☆715Updated 7 months ago
- FastKAN: Very Fast Implementation of Kolmogorov-Arnold Networks (KAN)☆377Updated 6 months ago
- Official implementation of "Samba: Simple Hybrid State Space Models for Efficient Unlimited Context Language Modeling"☆831Updated last month
- An efficient pure-PyTorch implementation of Kolmogorov-Arnold Network (KAN).☆4,192Updated 5 months ago
- A comprehensive collection of KAN(Kolmogorov-Arnold Network)-related resources, including libraries, projects, tutorials, papers, and mor…☆2,705Updated this week
- Training small GPT-2 style models using Kolmogorov-Arnold networks.☆113Updated 7 months ago
- Variations of Kolmogorov-Arnold Networks☆112Updated 8 months ago
- An easy to use PyTorch implementation of the Kolmogorov Arnold Network and a few novel variations☆169Updated last month
- Code for BLT research paper☆1,314Updated this week
- Reaching LLaMA2 Performance with 0.1M Dollars☆965Updated 5 months ago
- Official repository for the paper "Grokfast: Accelerated Grokking by Amplifying Slow Gradients"☆538Updated 6 months ago
- NanoGPT (124M) in 3.4 minutes☆2,068Updated last week
- Schedule-Free Optimization in PyTorch☆2,061Updated last month
- Make PyTorch models up to 40% faster! Thunder is a source to source compiler for PyTorch. It enables using different hardware executors a…☆1,257Updated this week
- Build high-performance AI models with modular building blocks☆456Updated this week
- PyTorch Implementation of Jamba: "Jamba: A Hybrid Transformer-Mamba Language Model"☆154Updated 2 months ago
- The Multilayer Perceptron Language Model☆532Updated 5 months ago
- GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection☆1,481Updated 2 months ago
- PyTorch implementation of Infini-Transformer from "Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention…☆286Updated 8 months ago
- Annotated version of the Mamba paper☆469Updated 10 months ago
- Repo for "Monarch Mixer: A Simple Sub-Quadratic GEMM-Based Architecture"☆541Updated 3 weeks ago
- A PyTorch native library for large model training☆3,091Updated this week
- Official repository of the xLSTM.☆1,635Updated this week
- Simple, minimal implementation of the Mamba SSM in one pytorch file. Using logcumsumexp (Heisen sequence).☆104Updated 3 months ago
- Mamba-Chat: A chat LLM based on the state-space model architecture 🐍☆916Updated 10 months ago
- Minimalistic 4D-parallelism distributed training framework for education purpose☆644Updated this week
- Open weights language model from Google DeepMind, based on Griffin.☆614Updated 6 months ago
- The AdEMAMix Optimizer: Better, Faster, Older.☆178Updated 4 months ago