Training small GPT-2 style models using Kolmogorov-Arnold networks.
☆122May 25, 2024Updated last year
Alternatives and similar repositories for KAN-GPT-2
Users that are interested in KAN-GPT-2 are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Kolmogorov-Arnold Networks (KAN) using Chebyshev polynomials instead of B-splines.☆409May 13, 2024Updated last year
- The PyTorch implementation of Generative Pre-trained Transformers (GPTs) using Kolmogorov-Arnold Networks (KANs) for language modeling☆725Nov 25, 2024Updated last year
- Official repository of paper "RNNs Are Not Transformers (Yet): The Key Bottleneck on In-context Retrieval"☆27Apr 17, 2024Updated 2 years ago
- Kolmogorov-Arnold Networks with various basis functions like B-Splines, Fourier, Chebyshev, Wavelets etc☆37May 8, 2024Updated last year
- Your favourite classical machine learning algos on the GPU/TPU☆22Dec 14, 2025Updated 4 months ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- High order and sparse layers in pytorch. Lagrange Polynomial, Piecewise Lagrange Polynomial, Piecewise Discontinuous Lagrange Polynomial…☆45Jun 24, 2024Updated last year
- Template repo for Python projects, especially those focusing on machine learning and/or deep learning.☆15Jan 14, 2026Updated 3 months ago
- An easy to use PyTorch implementation of the Kolmogorov Arnold Network and a few novel variations☆190Nov 24, 2024Updated last year
- A collection of reusable, high-performance, well-documented, thorough-tested layers and models in Jax☆23Jun 8, 2025Updated 10 months ago
- ☆10Oct 28, 2024Updated last year
- Official PyTorch implementation of CD-MOE☆12Mar 18, 2026Updated last month
- This repository contains papers for a comprehensive survey on accelerated generation techniques in Large Language Models (LLMs).☆11May 24, 2024Updated last year
- ☆19May 11, 2024Updated last year
- Code for experiments on transformers using Markovian data.☆22Nov 22, 2024Updated last year
- Wordpress hosting with auto-scaling - Free Trial • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- [ICLR 2025] Code for the paper "Implicit Search via Discrete Diffusion: A Study on Chess"☆37Mar 3, 2025Updated last year
- ☆32May 5, 2024Updated last year
- Implementation for paper Automata Extraction from Transformers.☆12Jun 8, 2024Updated last year
- Reinforcement Learning example in Nim, playing tic tac toe. Based off original C version from the great Antirez☆15Apr 2, 2025Updated last year
- Lion - EvoLved Sign Momentum w/ New Optimizer API in TensorFlow 2.11+☆10Feb 16, 2023Updated 3 years ago
- FastKAN: Very Fast Implementation of Kolmogorov-Arnold Networks (KAN)☆477Jun 20, 2024Updated last year
- Lightning-like training API for JAX with Flax☆45Dec 8, 2024Updated last year
- ☆11Aug 20, 2025Updated 7 months ago
- [EMNLP 2024] Tree of Problems: Improving structured problem solving with compositionality☆19Mar 4, 2025Updated last year
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- Example of how to use R in Jupyter notebooks and make compatible with Binder☆17Feb 25, 2019Updated 7 years ago
- ☆21Mar 1, 2023Updated 3 years ago
- ☆35Apr 12, 2024Updated 2 years ago
- ☆79Feb 4, 2025Updated last year
- ☆139May 8, 2024Updated last year
- ☆749May 24, 2024Updated last year
- ☆14Apr 18, 2025Updated last year
- JMLR Cover Letter Template☆10Dec 15, 2021Updated 4 years ago
- KAN for Vision Transformer☆254Oct 7, 2024Updated last year
- Simple, predictable pricing with DigitalOcean hosting • AdAlways know what you'll pay with monthly caps and flat pricing. Enterprise-grade infrastructure trusted by 600k+ customers.
- Code for Adam-mini: Use Fewer Learning Rates To Gain More https://arxiv.org/abs/2406.16793☆455May 13, 2025Updated 11 months ago
- Implementation of Diffusion Transformers and Rectified Flow in Jax☆27Jul 9, 2024Updated last year
- ☆27Feb 1, 2023Updated 3 years ago
- Trust Region Preference Approximation: A simple and stable reinforcement learning algorithm for LLM reasoning☆15Jun 28, 2025Updated 9 months ago
- ☆35Jun 2, 2025Updated 10 months ago
- Learning from preferences is a common paradigm for fine-tuning language models. Yet, many algorithmic design decisions come into play. Ou…☆32Apr 20, 2024Updated last year
- SOTA model implementations in JAX/FLAX☆302Aug 28, 2024Updated last year