kabachuha / nanoGPKANTLinks
Testing KAN-based text generation GPT models
☆18Updated last year
Alternatives and similar repositories for nanoGPKANT
Users that are interested in nanoGPKANT are comparing it to the libraries listed below
Sorting:
- Collection of autoregressive model implementation☆86Updated 6 months ago
- ☆28Updated last year
- The simplest, fastest repository for training/finetuning medium-sized xLSTMs.☆41Updated last year
- ☆62Updated last year
- an open source reproduction of NVIDIA's nGPT (Normalized Transformer with Representation Learning on the Hypersphere)☆107Updated 8 months ago
- Explorations into the proposal from the paper "Grokfast, Accelerated Grokking by Amplifying Slow Gradients"☆103Updated 11 months ago
- Curriculum training of instruction-following LLMs with Unsloth☆14Updated 8 months ago
- ☆35Updated 2 years ago
- An introduction to LLM Sampling☆79Updated 11 months ago
- ☆25Updated 11 months ago
- Functional local implementations of main model parallelism approaches☆96Updated 2 years ago
- Using multiple LLMs for ensemble Forecasting☆16Updated last year
- Swarming algorithms like PSO, Ant Colony, Sakana, and more in PyTorch 😊☆134Updated last month
- σ-GPT: A New Approach to Autoregressive Models☆69Updated last year
- QAlign is a new test-time alignment approach that improves language model performance by using Markov chain Monte Carlo methods.☆24Updated last week
- QLoRA with Enhanced Multi GPU Support☆37Updated 2 years ago
- ☆136Updated last year
- ☆40Updated last year
- Optimizing Causal LMs through GRPO with weighted reward functions and automated hyperparameter tuning using Optuna☆58Updated last month
- KMD is a collection of conversational exchanges between patients and doctors on various medical topics. It aims to capture the intricaci…☆24Updated 2 years ago
- Official implementation of the paper "Linear Transformers with Learnable Kernel Functions are Better In-Context Models"☆166Updated 10 months ago
- Simple repository for training small reasoning models☆45Updated 9 months ago
- Just large language models. Hackable, with as little abstraction as possible. Done for my own purposes, feel free to rip.☆44Updated 2 years ago
- ☆11Updated last year
- Train a SmolLM-style llm on fineweb-edu in JAX/Flax with an assortment of optimizers.☆18Updated 3 months ago
- Training Models Daily☆16Updated last year
- GPT-2 small trained on phi-like data☆67Updated last year
- Full finetuning of large language models without large memory requirements☆94Updated 2 months ago
- inference code for mixtral-8x7b-32kseqlen☆102Updated last year
- NanoGPT (124M) quality in 2.67B tokens☆28Updated 2 months ago