kabachuha / nanoGPKANTLinks
Testing KAN-based text generation GPT models
☆18Updated last year
Alternatives and similar repositories for nanoGPKANT
Users that are interested in nanoGPKANT are comparing it to the libraries listed below
Sorting:
- Collection of autoregressive model implementation☆85Updated this week
- The simplest, fastest repository for training/finetuning medium-sized xLSTMs.☆41Updated last year
- an open source reproduction of NVIDIA's nGPT (Normalized Transformer with Representation Learning on the Hypersphere)☆109Updated 10 months ago
- Parameter-Efficient Sparsity Crafting From Dense to Mixture-of-Experts for Instruction Tuning on General Tasks☆31Updated last year
- A minimal fine-tuning repo for LFM2, fully built on Open Source.☆69Updated last week
- Fully fine-tune large models like Mistral, Llama-2-13B, or Qwen-14B completely for free☆233Updated last year
- Explorations into the proposal from the paper "Grokfast, Accelerated Grokking by Amplifying Slow Gradients"☆103Updated last year
- ☆62Updated 2 years ago
- Google TPU optimizations for transformers models☆133Updated 3 weeks ago
- ☆137Updated last year
- ☆86Updated last year
- Swarming algorithms like PSO, Ant Colony, Sakana, and more in PyTorch 😊☆136Updated 2 months ago
- Set of scripts to finetune LLMs☆38Updated last year
- Just large language models. Hackable, with as little abstraction as possible. Done for my own purposes, feel free to rip.☆44Updated 2 years ago
- Full finetuning of large language models without large memory requirements☆94Updated 3 months ago
- ☆45Updated 2 years ago
- ☆27Updated last year
- Implementation of the Llama architecture with RLHF + Q-learning☆170Updated 11 months ago
- Implementation of mamba with rust☆89Updated last year
- KMD is a collection of conversational exchanges between patients and doctors on various medical topics. It aims to capture the intricaci…☆24Updated 2 years ago
- Modeling code for a BitNet b1.58 Llama-style model.☆25Updated last year
- Official implementation of the paper "Linear Transformers with Learnable Kernel Functions are Better In-Context Models"☆167Updated 11 months ago
- An introduction to LLM Sampling☆79Updated last year
- Optimizing Causal LMs through GRPO with weighted reward functions and automated hyperparameter tuning using Optuna☆59Updated 2 months ago
- An all-new Language Model That Processes Ultra-Long Sequences of 100,000+ Ultra-Fast☆149Updated last year
- σ-GPT: A New Approach to Autoregressive Models☆70Updated last year
- Implementation of the Mamba SSM with hf_integration.☆56Updated last year
- Notus is a collection of fine-tuned LLMs using SFT, DPO, SFT+DPO, and/or any other RLHF techniques, while always keeping a data-first app…☆169Updated last year
- Micro Llama is a small Llama based model with 300M parameters trained from scratch with $500 budget☆164Updated 5 months ago
- Video+code lecture on building nanoGPT from scratch☆68Updated last year