AI-Guru / helibrunnaLinks

A HuggingFace compatible Small Language Model trainer.

☆76

Alternatives and similar repositories for helibrunna

Users that are interested in helibrunna are comparing it to the libraries listed below

Sorting:

lucidrains / light-recurrent-unit-pytorch
Implementation of a Light Recurrent Unit in Pytorch
☆49Updated last year
NX-AI / mlstm_kernels
Tiled Flash Linear Attention library for fast and efficient mLSTM Kernels.
☆71Updated last week
kjslag / spacebyte
A byte-level decoder architecture that matches the performance of tokenized Transformers.
☆66Updated last year
lucidrains / grokfast-pytorch
Explorations into the proposal from the paper "Grokfast, Accelerated Grokking by Amplifying Slow Gradients"
☆102Updated 9 months ago
erogol / BlaGPT
Experimental playground for benchmarking language model (LM) architectures, layers, and tricks on smaller datasets. Designed for flexible…
☆83Updated last month
lucidrains / adam-atan2-pytorch
Implementation of the proposed Adam-atan2 from Google Deepmind in Pytorch
☆127Updated 10 months ago
chandar-lab / NeoBERT
☆83Updated 4 months ago
GenRobo / MatMamba
Code and pretrained models for the paper: "MatMamba: A Matryoshka State Space Model"
☆61Updated 10 months ago
epfml / DenseFormer
☆81Updated last year
AnswerDotAI / ModernBERT-Instruct-mini-cookbook
☆49Updated 8 months ago
LAGoM-NLP / transtokenizer
☆52Updated 8 months ago
lucidrains / gateloop-transformer
Implementation of GateLoop Transformer in Pytorch and Jax
☆90Updated last year
joey00072 / ohara
Collection of autoregressive model implementation
☆86Updated 5 months ago
AnswerDotAI / fastkmeans
☆77Updated 3 months ago
lucidrains / agent-attention-pytorch
Implementation of Agent Attention in Pytorch
☆91Updated last year
tval2 / contextual-pruning
Library to facilitate pruning of LLMs based on context
☆32Updated last year
ruke1ire / RTF
A State-Space Model with Rational Transfer Function Representation.
☆82Updated last year
lucidrains / hyper-connections
Attempt to make multiple residual streams from Bytedance's Hyper-Connections paper accessible to the public
☆91Updated 4 months ago
kyegomez / MambaTransformer
Integrating Mamba/SSMs with Transformer for Enhanced Long Context and High-Quality Sequence Modeling
☆206Updated last week
huggingface / competitions
☆124Updated 11 months ago
lucidrains / PEER-pytorch
Pytorch implementation of the PEER block from the paper, Mixture of A Million Experts, by Xu Owen He at Deepmind
☆129Updated last year
CG80499 / KAN-GPT-2
Training small GPT-2 style models using Kolmogorov-Arnold networks.
☆120Updated last year
apple / pytorch-speech-features
☆85Updated last year
lucidrains / transformer-lm-gan
Explorations into adversarial losses on top of autoregressive loss for language modeling
☆38Updated 7 months ago
andrewgcodes / xlstm
my attempts at implementing various bits of Sepp Hochreiter's new xLSTM architecture
☆131Updated last year
vmarinowski / infini-attention
An unofficial pytorch implementation of 'Efficient Infinite Context Transformers with Infini-attention'
☆53Updated last year
lucidrains / taylor-series-linear-attention
Explorations into the recently proposed Taylor Series Linear Attention
☆99Updated last year
ariG23498 / quantized-diffusion-inference
Notebook and Scripts that showcase running quantized diffusion models on consumer GPUs
☆38Updated 11 months ago
slp-rl / slamkit
SlamKit is an open source tool kit for efficient training of SpeechLMs. It was used for "Slamming: Training a Speech Language Model on On…
☆218Updated 5 months ago
proger / hippogriff
Griffin MQA + Hawk Linear RNN Hybrid
☆89Updated last year