AI-Guru / helibrunna
A HuggingFace compatible Small Language Model trainer.
☆73Updated last month
Related projects ⓘ
Alternatives and complementary repositories for helibrunna
- ☆76Updated 7 months ago
- Explorations into the proposal from the paper "Grokfast, Accelerated Grokking by Amplifying Slow Gradients"☆85Updated 2 months ago
- Implementation of a Light Recurrent Unit in Pytorch☆46Updated last month
- SaLSa Optimizer implementation (No learning rates needed)☆28Updated this week
- Implementation of Agent Attention in Pytorch☆86Updated 4 months ago
- A State-Space Model with Rational Transfer Function Representation.☆70Updated 6 months ago
- Collection of autoregressive model implementation☆67Updated this week
- This is the official repository of ISMIR 2024 paper "Emotion-driven Piano Music Generation via Two-stage Disentanglement and Functional R…☆47Updated 2 months ago
- ☆43Updated 2 months ago
- Implementation of GateLoop Transformer in Pytorch and Jax☆86Updated 5 months ago
- A byte-level decoder architecture that matches the performance of tokenized Transformers.☆59Updated 6 months ago
- Muon optimizer for neural networks: >30% extra sample efficiency, <3% wallclock overhead☆113Updated this week
- Library to facilitate pruning of LLMs based on context☆31Updated 9 months ago
- Implementation of Liquid Nets in Pytorch☆52Updated last week
- ☆93Updated last month
- Implementation of a modular, high-performance, and simplistic mamba for high-speed applications☆33Updated last week
- Explorations into the recently proposed Taylor Series Linear Attention☆90Updated 3 months ago
- RWKV, in easy to read code☆55Updated this week
- Official implementation of "GPT or BERT: why not both?"☆36Updated last week
- Set of scripts to finetune LLMs☆36Updated 7 months ago
- Implementation of the proposed Adam-atan2 from Google Deepmind in Pytorch☆94Updated 3 weeks ago
- ☆115Updated 3 weeks ago
- One Initialization to Rule them All: Fine-tuning via Explained Variance Adaptation☆29Updated last month
- Collection of tests performed during the study of the new Kolmogorov-Arnold Neural Networks (KAN)☆34Updated last month
- Implementation of BitNet-1.58 instruct tuning☆18Updated 7 months ago
- ☆108Updated this week
- This is the code that went into our practical dive using mamba as information extraction☆50Updated 10 months ago
- Notebook and Scripts that showcase running quantized diffusion models on consumer GPUs☆36Updated 3 weeks ago
- NLP with Rust for Python 🦀🐍☆59Updated 5 months ago
- Scaling is a distributed training library and installable dependency designed to scale up neural networks, with a dedicated module for tr…☆51Updated 3 weeks ago