AI-Guru / helibrunna
A HuggingFace compatible Small Language Model trainer.
☆74Updated 3 months ago
Alternatives and similar repositories for helibrunna:
Users that are interested in helibrunna are comparing it to the libraries listed below
- Implementation of a Light Recurrent Unit in Pytorch☆46Updated 7 months ago
- Tiled Flash Linear Attention library for fast and efficient mLSTM Kernels.☆56Updated last month
- ☆24Updated last week
- Explorations into the proposal from the paper "Grokfast, Accelerated Grokking by Amplifying Slow Gradients"☆99Updated 4 months ago
- ☆81Updated last year
- A State-Space Model with Rational Transfer Function Representation.☆78Updated 11 months ago
- SaLSa Optimizer implementation (No learning rates needed)☆29Updated last week
- Efficient encoder-decoder architecture for small language models (≤1B parameters) with cross-architecture knowledge distillation and visi…☆23Updated 3 months ago
- The simplest, fastest repository for training/finetuning medium-sized xLSTMs.☆42Updated 11 months ago
- Notebook and Scripts that showcase running quantized diffusion models on consumer GPUs☆38Updated 6 months ago
- ☆45Updated 3 months ago
- Efficiently discovering algorithms via LLMs with evolutionary search and reinforcement learning.☆68Updated 2 weeks ago
- ☆49Updated 2 months ago
- Implementation of Agent Attention in Pytorch☆89Updated 9 months ago
- Implementation of GateLoop Transformer in Pytorch and Jax☆87Updated 10 months ago
- https://x.com/BlinkDL_AI/status/1884768989743882276☆27Updated this week
- Pytorch (Lightning) implementation of the Mamba model☆27Updated 2 weeks ago
- ☆39Updated this week
- Set of scripts to finetune LLMs☆37Updated last year
- A single repo with all scripts and utils to train / fine-tune the Mamba model with or without FIM☆54Updated last year
- Lightweight package that tracks and summarizes code changes using LLMs (Large Language Models)☆34Updated 2 months ago
- Pytorch implementation of the xLSTM model by Beck et al. (2024)☆162Updated 8 months ago
- Explorations into adversarial losses on top of autoregressive loss for language modeling☆35Updated 2 months ago
- Trully flash implementation of DeBERTa disentangled attention mechanism.☆46Updated 3 weeks ago
- ☆60Updated 5 months ago
- ☆19Updated last week
- ☆47Updated 8 months ago
- This is the code that went into our practical dive using mamba as information extraction☆54Updated last year
- Code and pretrained models for the paper: "MatMamba: A Matryoshka State Space Model"☆59Updated 5 months ago
- Tokun to can tokens☆17Updated this week