TheSeriousProgrammer / SimpleBitNetLinks

Simple Adaptation of BitNet

☆32

Alternatives and similar repositories for SimpleBitNet

Users that are interested in SimpleBitNet are comparing it to the libraries listed below

Sorting:

Entropy-xcy / bitnet158
☆69Updated last year
CG80499 / KAN-GPT-2
Training small GPT-2 style models using Kolmogorov-Arnold networks.
☆120Updated last year
pbelcak / fastfeedforward
A repository for log-time feedforward networks
☆222Updated last year
rasbt / dora-from-scratch
LoRA and DoRA from Scratch Implementations
☆206Updated last year
Zyphra / BlackMamba
Code repository for Black Mamba
☆250Updated last year
nanowell / AdEMAMix-Optimizer-Pytorch
The AdEMAMix Optimizer: Better, Faster, Older.
☆183Updated 10 months ago
astramind-ai / BitMat
An efficent implementation of the method proposed in "The Era of 1-bit LLMs"
☆153Updated 9 months ago
syncdoth / RetNet
Huggingface compatible implementation of RetNet (Retentive Networks, https://arxiv.org/pdf/2307.08621.pdf) including parallel, recurrent,…
☆226Updated last year
PeaBrane / mamba-tiny
Simple, minimal implementation of the Mamba SSM in one pytorch file. Using logcumsumexp (Heisen sequence).
☆120Updated 9 months ago
pbelcak / UltraFastBERT
The repository for the code of the UltraFastBERT paper
☆516Updated last year
kyegomez / Jamba
PyTorch Implementation of Jamba: "Jamba: A Hybrid Transformer-Mamba Language Model"
☆173Updated 3 months ago
srush / annotated-mamba
Annotated version of the Mamba paper
☆486Updated last year
joey00072 / ohara
Collection of autoregressive model implementation
☆85Updated 2 months ago
lucidrains / grokfast-pytorch
Explorations into the proposal from the paper "Grokfast, Accelerated Grokking by Amplifying Slow Gradients"
☆101Updated 6 months ago
Locutusque / TPU-Alignment
Fully fine-tune large models like Mistral, Llama-2-13B, or Qwen-14B completely for free
☆232Updated 8 months ago
lucidrains / pytorch-custom-utils
Just some miscellaneous utility functions / decorators / modules related to Pytorch and Accelerate to help speed up implementation of new…
☆124Updated 11 months ago
Pleias / Various-Finetuning
Set of scripts to finetune LLMs
☆37Updated last year
ariG23498 / fine-tune-paligemma
Notebooks for fine tuning pali gemma
☆111Updated 3 months ago
LambdaLabsML / distributed-training-guide
Best practices & guides on how to write distributed pytorch training code
☆450Updated 4 months ago
hkproj / pytorch-lora
LORA: Low-Rank Adaptation of Large Language Models implemented using PyTorch
☆112Updated last year
muellerzr / minimal-trainer-zoo
Minimal example scripts of the Hugging Face Trainer, focused on staying under 150 lines
☆197Updated last year
johnma2006 / candle
Deep learning library implemented from scratch in numpy. Mixtral, Mamba, LLaMA, GPT, ResNet, and other experiments.
☆50Updated last year
dingo-actual / infini-transformer
PyTorch implementation of Infini-Transformer from "Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention…
☆290Updated last year
SynodicMonth / ChebyKAN
Kolmogorov-Arnold Networks (KAN) using Chebyshev polynomials instead of B-splines.
☆381Updated last year
center-for-humans-and-machines / transformer-heads
Toolkit for attaching, training, saving and loading of new heads for transformer models
☆282Updated 4 months ago
apple / ml-sigma-reparam
☆304Updated last year
rasbt / cvpr2023
☆133Updated last year
kyegomez / zeta
Build high-performance AI models with modular building blocks
☆533Updated this week
Indoxer / LKAN
Variations of Kolmogorov-Arnold Networks
☆115Updated last year
HenryNdubuaku / nanodl
A Jax-based library for building transformers, includes implementations of GPT, Gemma, LlaMa, Mixtral, Whisper, SWin, ViT and more.
☆290Updated 10 months ago