AlarioAI / bitnetLinks

Train and evaluate 1.58 bits Neural Networks

☆26

Alternatives and similar repositories for bitnet

Users that are interested in bitnet are comparing it to the libraries listed below

Sorting:

Entropy-xcy / bitnet158
☆69Updated last year
astramind-ai / BitMat
An efficent implementation of the method proposed in "The Era of 1-bit LLMs"
☆154Updated 9 months ago
pbelcak / fastfeedforward
A repository for log-time feedforward networks
☆223Updated last year
bloc97 / DeMo
DeMo: Decoupled Momentum Optimization
☆190Updated 8 months ago
Cornell-RelaxML / QuIP
Code for paper: "QuIP: 2-Bit Quantization of Large Language Models With Guarantees"
☆376Updated last year
LucasPrietoAl / grokking-at-the-edge-of-numerical-stability
☆100Updated 3 weeks ago
VITA-Group / Q-GaLore
Q-GaLore: Quantized GaLore with INT4 Projection and Layer-Adaptive Low-Rank Gradients.
☆199Updated last year
HazyResearch / lolcats
Repo for "LoLCATs: On Low-Rank Linearizing of Large Language Models"
☆244Updated 6 months ago
amazon-science / mxfp4-llm
Official implementation for Training LLMs with MXFP4
☆55Updated 3 months ago
IST-DASLab / qmoe
Code for the paper "QMoE: Practical Sub-1-Bit Compression of Trillion-Parameter Models".
☆277Updated last year
kjslag / spacebyte
A byte-level decoder architecture that matches the performance of tokenized Transformers.
☆65Updated last year
lucidrains / nGPT-pytorch
Quick implementation of nGPT, learning entirely on the hypersphere, from NvidiaAI
☆289Updated 2 months ago
NVIDIA / ngpt
Normalized Transformer (nGPT)
☆186Updated 8 months ago
Zyphra / tree_attention
Tree Attention: Topology-aware Decoding for Long-Context Attention on GPU clusters
☆127Updated 8 months ago
schwartz-lab-NLP / TOVA
Token Omission Via Attention
☆128Updated 9 months ago
joey00072 / ohara
Collection of autoregressive model implementation
☆86Updated 3 months ago
lucidrains / PEER-pytorch
Pytorch implementation of the PEER block from the paper, Mixture of A Million Experts, by Xu Owen He at Deepmind
☆127Updated 11 months ago
mengxiayu / LLMSuperWeight
Code for studying the super weight in LLM
☆115Updated 8 months ago
lucidrains / grokfast-pytorch
Explorations into the proposal from the paper "Grokfast, Accelerated Grokking by Amplifying Slow Gradients"
☆101Updated 7 months ago
dvruette / barrel-rec-pytorch
☆53Updated last year
hahnyuan / PB-LLM
PB-LLM: Partially Binarized Large Language Models
☆153Updated last year
FasterDecoding / BitDelta
☆199Updated 8 months ago
meta-pytorch / float8_experimental
This repository contains the experimental PyTorch native float8 training UX
☆224Updated last year
BlinkDL / modded-nanogpt-rwkv
RWKV-7: Surpassing GPT
☆94Updated 8 months ago
huggingface / kernels
Load compute kernels from the Hub
☆233Updated this week
proger / accelerated-scan
Accelerated First Order Parallel Associative Scan
☆184Updated 11 months ago
IST-DASLab / QuEST
Work in progress.
☆70Updated last month
llm-random / llm-random
☆192Updated this week
ethansmith2000 / fsdp_optimizers
supporting pytorch FSDP for optimizers
☆84Updated 8 months ago
Zyphra / Zamba2
PyTorch implementation of models from the Zamba2 series.
☆184Updated 6 months ago