AI-Guru / helibrunna
A HuggingFace compatible Small Language Model trainer.
☆74Updated 2 months ago
Alternatives and similar repositories for helibrunna:
Users that are interested in helibrunna are comparing it to the libraries listed below
- Explorations into the proposal from the paper "Grokfast, Accelerated Grokking by Amplifying Slow Gradients"☆98Updated 3 months ago
- Implementation of a Light Recurrent Unit in Pytorch☆47Updated 6 months ago
- Tiled Flash Linear Attention library for fast and efficient mLSTM Kernels.☆53Updated last week
- A State-Space Model with Rational Transfer Function Representation.☆78Updated 11 months ago
- Explorations into the recently proposed Taylor Series Linear Attention☆97Updated 7 months ago
- ☆44Updated last month
- Implementation of Agent Attention in Pytorch☆90Updated 9 months ago
- Explorations into adversarial losses on top of autoregressive loss for language modeling☆35Updated last month
- Implementation of the proposed Adam-atan2 from Google Deepmind in Pytorch☆103Updated 4 months ago
- Scaling is a distributed training library and installable dependency designed to scale up neural networks, with a dedicated module for tr…☆58Updated 5 months ago
- ☆92Updated 2 months ago
- Collection of autoregressive model implementation☆85Updated 2 months ago
- ☆79Updated last year
- Attempt to make multiple residual streams from Bytedance's Hyper-Connections paper accessible to the public☆82Updated 2 months ago
- This is the code that went into our practical dive using mamba as information extraction☆53Updated last year
- Randomized Positional Encodings Boost Length Generalization of Transformers☆80Updated last year
- Implementation of GateLoop Transformer in Pytorch and Jax☆87Updated 9 months ago
- https://x.com/BlinkDL_AI/status/1884768989743882276☆27Updated 2 months ago
- Code and pretrained models for the paper: "MatMamba: A Matryoshka State Space Model"☆59Updated 4 months ago
- MLX implementation of xLSTM model by Beck et al. (2024)☆27Updated 10 months ago
- Efficient encoder-decoder architecture for small language models (≤1B parameters) with cross-architecture knowledge distillation and visi…☆23Updated 2 months ago
- Pytorch implementation of the PEER block from the paper, Mixture of A Million Experts, by Xu Owen He at Deepmind☆123Updated 7 months ago
- ☆47Updated 7 months ago
- Notebook and Scripts that showcase running quantized diffusion models on consumer GPUs☆38Updated 5 months ago
- SaLSa Optimizer implementation (No learning rates needed)☆29Updated this week
- Train, tune, and infer Bamba model☆88Updated 3 months ago
- ☆46Updated last week
- ☆14Updated last year
- ☆77Updated 7 months ago
- Set of scripts to finetune LLMs☆37Updated last year