LabStrangeLoop / bitnetLinks

Train and evaluate 1.58 bits Neural Networks

☆26

Alternatives and similar repositories for bitnet

Users that are interested in bitnet are comparing it to the libraries listed below

Sorting:

astramind-ai / BitMat
An efficent implementation of the method proposed in "The Era of 1-bit LLMs"
☆155Updated last year
Entropy-xcy / bitnet158
☆70Updated last year
pbelcak / fastfeedforward
A repository for log-time feedforward networks
☆224Updated last year
LucasPrietoAl / grokking-at-the-edge-of-numerical-stability
☆108Updated 5 months ago
CG80499 / KAN-GPT-2
Training small GPT-2 style models using Kolmogorov-Arnold networks.
☆122Updated last year
bloc97 / DeMo
DeMo: Decoupled Momentum Optimization
☆198Updated last year
VITA-Group / Q-GaLore
Q-GaLore: Quantized GaLore with INT4 Projection and Layer-Adaptive Low-Rank Gradients.
☆201Updated last year
lucidrains / PEER-pytorch
Pytorch implementation of the PEER block from the paper, Mixture of A Million Experts, by Xu Owen He at Deepmind
☆132Updated 2 months ago
lucidrains / nGPT-pytorch
Quick implementation of nGPT, learning entirely on the hypersphere, from NvidiaAI
☆294Updated 7 months ago
lucidrains / grokfast-pytorch
Explorations into the proposal from the paper "Grokfast, Accelerated Grokking by Amplifying Slow Gradients"
☆103Updated last year
NVIDIA / ngpt
Normalized Transformer (nGPT)
☆195Updated last year
Zyphra / Zamba2
PyTorch implementation of models from the Zamba2 series.
☆186Updated 11 months ago
Cornell-RelaxML / QuIP
Code for paper: "QuIP: 2-Bit Quantization of Large Language Models With Guarantees"
☆392Updated last year
amazon-science / mxfp4-llm
Official implementation for Training LLMs with MXFP4
☆116Updated 8 months ago
google-deepmind / asyncdiloco
☆47Updated last year
llm-random / llm-random
☆206Updated 3 weeks ago
nanowell / AdEMAMix-Optimizer-Pytorch
The AdEMAMix Optimizer: Better, Faster, Older.
☆186Updated last year
schwartz-lab-NLP / TOVA
Token Omission Via Attention
☆128Updated last year
NVlabs / hymba
☆205Updated last year
IST-DASLab / qmoe
Code for the paper "QMoE: Practical Sub-1-Bit Compression of Trillion-Parameter Models".
☆279Updated 2 years ago
kotak-ai / 1.58BitNet
Experimental BitNet Implementation
☆73Updated last month
IST-DASLab / QuEST
Work in progress.
☆76Updated last month
mengxiayu / LLMSuperWeight
Code for studying the super weight in LLM
☆121Updated last year
HazyResearch / lolcats
Repo for "LoLCATs: On Low-Rank Linearizing of Large Language Models"
☆249Updated 11 months ago
kyegomez / MambaByte
Implementation of MambaByte in "MambaByte: Token-free Selective State Space Model" in Pytorch and Zeta
☆126Updated 2 months ago
huggingface / kernels
Load compute kernels from the Hub
☆357Updated 3 weeks ago
HazyResearch / zoology
Understand and test language model architectures on synthetic tasks.
☆248Updated 3 months ago
BlinkDL / modded-nanogpt-rwkv
RWKV-7: Surpassing GPT
☆103Updated last year
NX-AI / mlstm_kernels
Tiled Flash Linear Attention library for fast and efficient mLSTM Kernels.
☆81Updated last month
FasterDecoding / BitDelta
☆204Updated last year