CG80499 / KAN-GPT-2
Training small GPT-2 style models using Kolmogorov-Arnold networks.
☆114Updated 9 months ago
Alternatives and similar repositories for KAN-GPT-2:
Users that are interested in KAN-GPT-2 are comparing it to the libraries listed below
- Kolmogorov-Arnold Networks (KAN) using Chebyshev polynomials instead of B-splines.☆365Updated 9 months ago
- A single repo with all scripts and utils to train / fine-tune the Mamba model with or without FIM☆53Updated 11 months ago
- Explorations into the proposal from the paper "Grokfast, Accelerated Grokking by Amplifying Slow Gradients"☆97Updated 2 months ago
- Pytorch implementation of the PEER block from the paper, Mixture of A Million Experts, by Xu Owen He at Deepmind☆121Updated 6 months ago
- Implementation of MambaByte in "MambaByte: Token-free Selective State Space Model" in Pytorch and Zeta☆114Updated last month
- PyTorch Implementation of Jamba: "Jamba: A Hybrid Transformer-Mamba Language Model"☆159Updated last month
- Attempt to make multiple residual streams from Bytedance's Hyper-Connections paper accessible to the public☆76Updated 3 weeks ago
- Simple, minimal implementation of the Mamba SSM in one pytorch file. Using logcumsumexp (Heisen sequence).☆113Updated 4 months ago
- Trying out the Mamba architecture on small examples (cifar-10, shakespeare char level etc.)☆44Updated last year
- Collection of autoregressive model implementation☆83Updated 3 weeks ago
- Implementation of the proposed Adam-atan2 from Google Deepmind in Pytorch☆101Updated 3 months ago
- Integrating Mamba/SSMs with Transformer for Enhanced Long Context and High-Quality Sequence Modeling☆188Updated last month
- The AdEMAMix Optimizer: Better, Faster, Older.☆178Updated 6 months ago
- Explorations into the recently proposed Taylor Series Linear Attention☆94Updated 6 months ago
- ☆89Updated last month
- σ-GPT: A New Approach to Autoregressive Models☆61Updated 6 months ago
- Implementation of Infini-Transformer in Pytorch☆109Updated 2 months ago
- A More Fair and Comprehensive Comparison between KAN and MLP☆161Updated 6 months ago
- Just some miscellaneous utility functions / decorators / modules related to Pytorch and Accelerate to help speed up implementation of new…☆120Updated 7 months ago
- Normalized Transformer (nGPT)☆156Updated 3 months ago
- An easy to use PyTorch implementation of the Kolmogorov Arnold Network and a few novel variations☆174Updated 3 months ago
- This is the code that went into our practical dive using mamba as information extraction☆52Updated last year
- Implementation of the Llama architecture with RLHF + Q-learning☆163Updated last month
- RWKV, in easy to read code☆69Updated 3 months ago
- When it comes to optimizers, it's always better to be safe than sorry☆213Updated 2 weeks ago
- The official repository for HyperZ⋅Z⋅W Operator Connects Slow-Fast Networks for Full Context Interaction.☆33Updated last month
- A State-Space Model with Rational Transfer Function Representation.☆77Updated 9 months ago
- Code repository for Black Mamba☆240Updated last year