CG80499 / KAN-GPT-2Links
Training small GPT-2 style models using Kolmogorov-Arnold networks.
☆122Updated last year
Alternatives and similar repositories for KAN-GPT-2
Users that are interested in KAN-GPT-2 are comparing it to the libraries listed below
Sorting:
- Explorations into the proposal from the paper "Grokfast, Accelerated Grokking by Amplifying Slow Gradients"☆103Updated last year
- ☆109Updated 5 months ago
- PyTorch implementation of models from the Zamba2 series.☆186Updated 11 months ago
- A single repo with all scripts and utils to train / fine-tune the Mamba model with or without FIM☆61Updated last year
- Implementation of the Llama architecture with RLHF + Q-learning☆170Updated 11 months ago
- Collection of autoregressive model implementation☆85Updated last week
- Pytorch implementation of the PEER block from the paper, Mixture of A Million Experts, by Xu Owen He at Deepmind☆132Updated 2 months ago
- ☆129Updated 5 months ago
- Quick implementation of nGPT, learning entirely on the hypersphere, from NvidiaAI☆293Updated 7 months ago
- Swarming algorithms like PSO, Ant Colony, Sakana, and more in PyTorch 😊☆136Updated this week
- ☆82Updated last year
- A State-Space Model with Rational Transfer Function Representation.☆83Updated last year
- The simplest, fastest repository for training/finetuning medium-sized xLSTMs.☆41Updated last year
- Attempt to make multiple residual streams from Bytedance's Hyper-Connections paper accessible to the public☆148Updated this week
- Getting crystal-like representations with harmonic loss☆195Updated 9 months ago
- Normalized Transformer (nGPT)☆196Updated last year
- ☆161Updated 2 months ago
- Simple, minimal implementation of the Mamba SSM in one pytorch file. Using logcumsumexp (Heisen sequence).☆129Updated last year
- Code repository for Black Mamba☆261Updated last year
- DeMo: Decoupled Momentum Optimization☆198Updated last year
- 📄Small Batch Size Training for Language Models☆79Updated 3 months ago
- A repository for log-time feedforward networks☆224Updated last year
- Deep learning library implemented from scratch in numpy. Mixtral, Mamba, LLaMA, GPT, ResNet, and other experiments.☆53Updated last year
- The AdEMAMix Optimizer: Better, Faster, Older.☆186Updated last year
- Code to reproduce "Transformers Can Do Arithmetic with the Right Embeddings", McLeish et al (NeurIPS 2024)☆198Updated last year
- Just some miscellaneous utility functions / decorators / modules related to Pytorch and Accelerate to help speed up implementation of new…☆126Updated last year
- my attempts at implementing various bits of Sepp Hochreiter's new xLSTM architecture☆134Updated last year
- σ-GPT: A New Approach to Autoregressive Models☆70Updated last year
- PyTorch Implementation of Jamba: "Jamba: A Hybrid Transformer-Mamba Language Model"☆203Updated 2 weeks ago
- Tiled Flash Linear Attention library for fast and efficient mLSTM Kernels.☆82Updated last month