CG80499 / KAN-GPT-2Links

Training small GPT-2 style models using Kolmogorov-Arnold networks.

☆121

Alternatives and similar repositories for KAN-GPT-2

Users that are interested in KAN-GPT-2 are comparing it to the libraries listed below

Sorting:

LucasPrietoAl / grokking-at-the-edge-of-numerical-stability
☆102Updated 3 months ago
tanaymeh / mamba-train
A single repo with all scripts and utils to train / fine-tune the Mamba model with or without FIM
☆59Updated last year
lucidrains / grokfast-pytorch
Explorations into the proposal from the paper "Grokfast, Accelerated Grokking by Amplifying Slow Gradients"
☆103Updated 10 months ago
Zyphra / BlackMamba
Code repository for Black Mamba
☆258Updated last year
jacobfa / fft
☆128Updated 2 months ago
lucidrains / PEER-pytorch
Pytorch implementation of the PEER block from the paper, Mixture of A Million Experts, by Xu Owen He at Deepmind
☆129Updated last year
johnma2006 / candle
Deep learning library implemented from scratch in numpy. Mixtral, Mamba, LLaMA, GPT, ResNet, and other experiments.
☆52Updated last year
lucidrains / nGPT-pytorch
Quick implementation of nGPT, learning entirely on the hypersphere, from NvidiaAI
☆291Updated 4 months ago
epfml / DenseFormer
☆81Updated last year
kyegomez / Jamba
PyTorch Implementation of Jamba: "Jamba: A Hybrid Transformer-Mamba Language Model"
☆192Updated this week
lucidrains / llama-qrlhf
Implementation of the Llama architecture with RLHF + Q-learning
☆167Updated 8 months ago
kyegomez / MambaTransformer
Integrating Mamba/SSMs with Transformer for Enhanced Long Context and High-Quality Sequence Modeling
☆207Updated last week
joey00072 / ohara
Collection of autoregressive model implementation
☆86Updated 6 months ago
Zyphra / Zamba2
PyTorch implementation of models from the Zamba2 series.
☆185Updated 9 months ago
nanowell / AdEMAMix-Optimizer-Pytorch
The AdEMAMix Optimizer: Better, Faster, Older.
☆186Updated last year
NVIDIA / ngpt
Normalized Transformer (nGPT)
☆192Updated 11 months ago
ruke1ire / RTF
A State-Space Model with Rational Transfer Function Representation.
☆82Updated last year
zaydzuhri / softpick-attention
Landing repository for the paper "Softpick: No Attention Sink, No Massive Activations with Rectified Softmax"
☆85Updated last month
alxndrTL / othello_mamba
Evaluating the Mamba architecture on the Othello game
☆48Updated last year
KindXiaoming / grow-crystals
Getting crystal-like representations with harmonic loss
☆192Updated 6 months ago
PeaBrane / mamba-tiny
Simple, minimal implementation of the Mamba SSM in one pytorch file. Using logcumsumexp (Heisen sequence).
☆125Updated last year
proger / hippogriff
Griffin MQA + Hawk Linear RNN Hybrid
☆89Updated last year
pbelcak / fastfeedforward
A repository for log-time feedforward networks
☆222Updated last year
lucidrains / infini-transformer-pytorch
Implementation of Infini-Transformer in Pytorch
☆113Updated 9 months ago
kyegomez / swarms-pytorch
Swarming algorithms like PSO, Ant Colony, Sakana, and more in PyTorch 😊
☆133Updated last week
lucidrains / taylor-series-linear-attention
Explorations into the recently proposed Taylor Series Linear Attention
☆99Updated last year
CLAIRE-Labo / EvoTune
Efficiently discovering algorithms via LLMs with evolutionary search and reinforcement learning.
☆116Updated this week
bloc97 / DeMo
DeMo: Decoupled Momentum Optimization
☆195Updated 10 months ago
SynodicMonth / ChebyKAN
Kolmogorov-Arnold Networks (KAN) using Chebyshev polynomials instead of B-splines.
☆391Updated last year
Oxen-AI / mamba-dive
This is the code that went into our practical dive using mamba as information extraction
☆55Updated last year