AI-Guru / helibrunna
A HuggingFace compatible Small Language Model trainer.
☆74Updated 2 weeks ago
Alternatives and similar repositories for helibrunna:
Users that are interested in helibrunna are comparing it to the libraries listed below
- Explorations into the proposal from the paper "Grokfast, Accelerated Grokking by Amplifying Slow Gradients"☆95Updated last month
- A State-Space Model with Rational Transfer Function Representation.☆77Updated 9 months ago
- A library for fast and efficient mLSTM Kernels.☆8Updated 2 months ago
- ☆78Updated 10 months ago
- SaLSa Optimizer implementation (No learning rates needed)☆28Updated 2 weeks ago
- Implementation of a Light Recurrent Unit in Pytorch☆48Updated 4 months ago
- A byte-level decoder architecture that matches the performance of tokenized Transformers.☆65Updated 9 months ago
- ☆51Updated 5 months ago
- ☆41Updated 3 weeks ago
- my attempts at implementing various bits of Sepp Hochreiter's new xLSTM architecture☆129Updated 9 months ago
- Exploration into the proposed "Self Reasoning Tokens" by Felipe Bonetto☆55Updated 9 months ago
- This is the official repo for Gradient Agreement Filtering (GAF).☆22Updated 3 weeks ago
- Official repository of the IEEE SLT 2024 paper "Self-Supervised Syllable Discovery Based on Speaker-Disentangled HuBERT"☆34Updated this week
- Official JAX implementation of xLSTM including fast and efficient training and inference code. 7B model available at https://huggingface.…☆82Updated last month
- Implementation of GateLoop Transformer in Pytorch and Jax☆87Updated 8 months ago
- This is the official repository of ISMIR 2024 paper "Emotion-driven Piano Music Generation via Two-stage Disentanglement and Functional R…☆53Updated 5 months ago
- Implementation of the proposed Adam-atan2 from Google Deepmind in Pytorch☆102Updated 2 months ago
- Training and evaluation code for the paper "Headless Language Models: Learning without Predicting with Contrastive Weight Tying" (https:/…☆25Updated 10 months ago
- Collection of autoregressive model implementation☆81Updated last week
- Generalist and Lightweight Model for Text Classification☆79Updated this week
- Notebook and Scripts that showcase running quantized diffusion models on consumer GPUs☆38Updated 3 months ago
- Implementation of Agent Attention in Pytorch☆90Updated 7 months ago
- Train, tune, and infer Bamba model☆84Updated last month
- An implementation of PSGD Kron second-order optimizer for PyTorch☆83Updated last week
- Audio tokenization, in the fastest way possible!☆48Updated 5 months ago
- Library to facilitate pruning of LLMs based on context☆32Updated last year
- ☆55Updated 3 months ago
- Attempt to make multiple residual streams from Bytedance's Hyper-Connections paper accessible to the public☆73Updated this week