kabachuha / nanoGPKANTLinks
Testing KAN-based text generation GPT models
☆18Updated last year
Alternatives and similar repositories for nanoGPKANT
Users that are interested in nanoGPKANT are comparing it to the libraries listed below
Sorting:
- Official implementation of the paper "Linear Transformers with Learnable Kernel Functions are Better In-Context Models"☆162Updated 7 months ago
- an open source reproduction of NVIDIA's nGPT (Normalized Transformer with Representation Learning on the Hypersphere)☆105Updated 5 months ago
- ☆61Updated last year
- ☆11Updated last year
- Swarming algorithms like PSO, Ant Colony, Sakana, and more in PyTorch 😊☆129Updated 3 weeks ago
- The simplest, fastest repository for training/finetuning medium-sized xLSTMs.☆41Updated last year
- Implementation of mamba with rust☆88Updated last year
- Collection of autoregressive model implementation☆86Updated 4 months ago
- Using multiple LLMs for ensemble Forecasting☆16Updated last year
- ☆27Updated last year
- Explorations into the proposal from the paper "Grokfast, Accelerated Grokking by Amplifying Slow Gradients"☆101Updated 8 months ago
- Parameter-Efficient Sparsity Crafting From Dense to Mixture-of-Experts for Instruction Tuning on General Tasks☆31Updated last year
- σ-GPT: A New Approach to Autoregressive Models☆67Updated last year
- Train a SmolLM-style llm on fineweb-edu in JAX/Flax with an assortment of optimizers.☆19Updated last month
- KMD is a collection of conversational exchanges between patients and doctors on various medical topics. It aims to capture the intricaci…☆24Updated last year
- Torch-activation, a library of activation functions for PyTorch library☆26Updated 4 months ago
- Functional local implementations of main model parallelism approaches☆96Updated 2 years ago
- Implementation of the Mamba SSM with hf_integration.☆56Updated last year
- QLoRA for Masked Language Modeling☆22Updated last year
- Zeta implementation of a reusable and plug in and play feedforward from the paper "Exponentially Faster Language Modeling"☆16Updated 9 months ago
- alternative way to calculating self attention☆18Updated last year
- aesthetic tensor visualiser☆24Updated 4 months ago
- QLoRA with Enhanced Multi GPU Support☆37Updated 2 years ago
- Optimizing Causal LMs through GRPO with weighted reward functions and automated hyperparameter tuning using Optuna☆55Updated 7 months ago
- This is the code that went into our practical dive using mamba as information extraction☆55Updated last year
- An introduction to LLM Sampling☆79Updated 8 months ago
- HomebrewNLP in JAX flavour for maintable TPU-Training☆50Updated last year
- ☆88Updated last year
- An all-new Language Model That Processes Ultra-Long Sequences of 100,000+ Ultra-Fast☆151Updated last year
- EvaByte: Efficient Byte-level Language Models at Scale☆109Updated 4 months ago