AdityaNG / kan-gptLinks
The PyTorch implementation of Generative Pre-trained Transformers (GPTs) using Kolmogorov-Arnold Networks (KANs) for language modeling
☆715Updated 6 months ago
Alternatives and similar repositories for kan-gpt
Users that are interested in kan-gpt are comparing it to the libraries listed below
Sorting:
- ☆736Updated last year
- Kolmogorov-Arnold Networks (KAN) using Chebyshev polynomials instead of B-splines.☆375Updated last year
- [ICLR 2025] Samba: Simple Hybrid State Space Models for Efficient Unlimited Context Language Modeling☆876Updated last month
- FastKAN: Very Fast Implementation of Kolmogorov-Arnold Networks (KAN)☆414Updated 11 months ago
- Training small GPT-2 style models using Kolmogorov-Arnold networks.☆117Updated last year
- An efficient pure-PyTorch implementation of Kolmogorov-Arnold Network (KAN).☆4,368Updated 10 months ago
- Schedule-Free Optimization in PyTorch☆2,162Updated last week
- A comprehensive collection of KAN(Kolmogorov-Arnold Network)-related resources, including libraries, projects, tutorials, papers, and mor…☆2,947Updated 3 months ago
- An easy to use PyTorch implementation of the Kolmogorov Arnold Network and a few novel variations☆181Updated 6 months ago
- Official repository for the paper "Grokfast: Accelerated Grokking by Amplifying Slow Gradients"☆554Updated 11 months ago
- Mamba-Chat: A chat LLM based on the state-space model architecture 🐍☆922Updated last year
- Training LLMs with QLoRA + FSDP☆1,479Updated 6 months ago
- Repo for "Monarch Mixer: A Simple Sub-Quadratic GEMM-Based Architecture"☆550Updated 5 months ago
- nanoGPT style version of Llama 3.1☆1,372Updated 9 months ago
- GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection☆1,565Updated 7 months ago
- Reaching LLaMA2 Performance with 0.1M Dollars☆979Updated 10 months ago
- A Self-adaptation Framework🐙 that adapts LLMs for unseen tasks in real-time!☆1,066Updated 4 months ago
- A More Fair and Comprehensive Comparison between KAN and MLP☆169Updated 9 months ago
- Implementation of "BitNet: Scaling 1-bit Transformers for Large Language Models" in pytorch☆1,830Updated last month
- Official repository of the xLSTM.☆1,880Updated this week
- Thunder gives you PyTorch models superpowers for training and inference. Unlock out-of-the-box optimizations for performance, memory and …☆1,357Updated this week
- PyTorch Implementation of Jamba: "Jamba: A Hybrid Transformer-Mamba Language Model"☆169Updated last month
- Open weights language model from Google DeepMind, based on Griffin.☆639Updated last week
- Annotated version of the Mamba paper☆482Updated last year
- Variations of Kolmogorov-Arnold Networks☆114Updated last year
- The Tensor (or Array)☆433Updated 9 months ago
- NanoGPT (124M) in 3 minutes☆2,600Updated last week
- llama3.np is a pure NumPy implementation for Llama 3 model.☆981Updated last month
- Beyond Language Models: Byte Models are Digital World Simulators☆322Updated 11 months ago
- LoRA and DoRA from Scratch Implementations☆203Updated last year