lhallee / MOE-BitNet158Links

☆10

Alternatives and similar repositories for MOE-BitNet158

Users that are interested in MOE-BitNet158 are comparing it to the libraries listed below

Sorting:

sustcsonglin / gated_linear_attention_layer
☆32Updated last year
OpenMOSE / RWKV-Infer
A large-scale RWKV v6, v7(World, PRWKV, Hybrid-RWKV) inference. Capable of inference by combining multiple states(Pseudo MoE). Easy to de…
☆38Updated this week
Yuan-ManX / Titans-PyTorch
PyTorch implementation of Titans.
☆23Updated 5 months ago
yynil / RWKVInside
☆37Updated 2 months ago
nanowell / Q-Sparse-LLM
My Implementation of Q-Sparse: All Large Language Models can be Fully Sparsely-Activated
☆32Updated 10 months ago
RWKV / ZeroCoT
https://x.com/BlinkDL_AI/status/1884768989743882276
☆28Updated last month
hamishivi / tess-2
Repository for "TESS-2: A Large-Scale, Generalist Diffusion Language Model"
☆36Updated 4 months ago
recursal / GoldFinch-paper
GoldFinch and other hybrid transformer components
☆45Updated 11 months ago
kjslag / spacebyte
A byte-level decoder architecture that matches the performance of tokenized Transformers.
☆63Updated last year
snu-mllab / GuidedQuant
Official PyTorch implementation of "GuidedQuant: Large Language Model Quantization via Exploiting End Loss Guidance" (ICML 2025)
☆30Updated this week
RobertCsordas / moe
Official repository for the paper "Approximating Two-Layer Feedforward Networks for Efficient Transformers"
☆38Updated 2 weeks ago
BlinkDL / SmallInitEmb
LayerNorm(SmallInit(Embedding)) in a Transformer to improve convergence
☆59Updated 3 years ago
VITA-Group / WeLore
From GaLore to WeLore: How Low-Rank Weights Non-uniformly Emerge from Low-Rank Gradients. Ajay Jaiswal, Lu Yin, Zhenyu Zhang, Shiwei Liu,…
☆47Updated 2 months ago
cloneofsimo / fim-llama-deepspeed
☆31Updated last year
selfsupervised-ai / Natural-GaLore
An extention to the GaLore paper, to perform Natural Gradient Descent in low rank subspace
☆17Updated 8 months ago
howard-hou / RWKV-X
RWKV-X is a Linear Complexity Hybrid Language Model based on the RWKV architecture, integrating Sparse Attention to improve the model's l…
☆41Updated 2 months ago
TRI-ML / linear_open_lm
A repository for research on medium sized language models.
☆77Updated last year
tobiaskatsch / GatedLinearRNN
☆27Updated last year
renll / SeqBoat
[NeurIPS 2023] Sparse Modular Activation for Efficient Sequence Modeling
☆37Updated last year
BlinkDL / LinearAttentionArena
Here we will test various linear attention designs.
☆59Updated last year
BlinkDL / modded-nanogpt-rwkv
RWKV-7: Surpassing GPT
☆92Updated 7 months ago
schneiderkamplab / bitlinear
BitLinear implementation
☆31Updated 6 months ago
jadechip / nanoXLSTM
The simplest, fastest repository for training/finetuning medium-sized xLSTMs.
☆41Updated last year
OpenNLPLab / HGRN
[NeurIPS 2023 spotlight] Official implementation of HGRN in our NeurIPS 2023 paper - Hierarchically Gated Recurrent Neural Network for Se…
☆65Updated last year
kyegomez / LM-Infinite
Implementation of "LM-Infinite: Simple On-the-Fly Length Generalization for Large Language Models"
☆41Updated 7 months ago
lucidrains / gateloop-transformer
Implementation of GateLoop Transformer in Pytorch and Jax
☆89Updated last year
gmongaras / Cottention_Transformer
Code for the paper "Cottention: Linear Transformers With Cosine Attention"
☆17Updated 8 months ago
erogol / BlaGPT
Experimental playground for benchmarking language model (LM) architectures, layers, and tricks on smaller datasets. Designed for flexible…
☆57Updated last week
chu-tianxiang / QuIP-for-all
QuIP quantization
☆54Updated last year
catid / spectral_ssm
Implementation of Spectral State Space Models
☆16Updated last year