Zyphra / Zamba2Links

PyTorch implementation of models from the Zamba2 series.

☆186

Alternatives and similar repositories for Zamba2

Users that are interested in Zamba2 are comparing it to the libraries listed below

Sorting:

VITA-Group / Q-GaLore
Q-GaLore: Quantized GaLore with INT4 Projection and Layer-Adaptive Low-Rank Gradients.
☆202Updated last year
OpenEvaByte / evabyte
EvaByte: Efficient Byte-level Language Models at Scale
☆111Updated 7 months ago
HazyResearch / lolcats
Repo for "LoLCATs: On Low-Rank Linearizing of Large Language Models"
☆249Updated 10 months ago
Zyphra / tree_attention
Tree Attention: Topology-aware Decoding for Long-Context Attention on GPU clusters
☆130Updated last year
BlinkDL / modded-nanogpt-rwkv
RWKV-7: Surpassing GPT
☆101Updated last year
lucidrains / PEER-pytorch
Pytorch implementation of the PEER block from the paper, Mixture of A Million Experts, by Xu Owen He at Deepmind
☆131Updated last month
NVlabs / hymba
☆202Updated 11 months ago
casper-hansen / OpenCoconut
OpenCoconut implements a latent reasoning paradigm where we generate thoughts before decoding.
☆173Updated 10 months ago
foundation-model-stack / bamba
Train, tune, and infer Bamba model
☆136Updated 5 months ago
LucasPrietoAl / grokking-at-the-edge-of-numerical-stability
☆105Updated 4 months ago
microsoft / ArchScale
Simple & Scalable Pretraining for Neural Architecture Research
☆302Updated last month
astramind-ai / BitMat
An efficent implementation of the method proposed in "The Era of 1-bit LLMs"
☆155Updated last year
kjslag / spacebyte
A byte-level decoder architecture that matches the performance of tokenized Transformers.
☆66Updated last year
bloc97 / DeMo
DeMo: Decoupled Momentum Optimization
☆197Updated last year
itsnamgyu / block-transformer
Block Transformer: Global-to-Local Language Modeling for Fast Inference (NeurIPS 2024)
☆162Updated 7 months ago
facebookresearch / memory
Memory layers use a trainable key-value lookup mechanism to add extra parameters to a model without increasing FLOPs. Conceptually, spars…
☆360Updated 11 months ago
NVIDIA / ngpt
Normalized Transformer (nGPT)
☆194Updated last year
RobertCsordas / moeut
☆89Updated last year
JoeLi12345 / nGPT
an open source reproduction of NVIDIA's nGPT (Normalized Transformer with Representation Learning on the Hypersphere)
☆108Updated 8 months ago
IST-DASLab / QuEST
Work in progress.
☆75Updated last week
llm-random / llm-random
☆205Updated this week
jxiw / MambaInLlama
[NeurIPS 2024] Official Repository of The Mamba in the Llama: Distilling and Accelerating Hybrid Models
☆232Updated last month
microsoft / GRIN-MoE
GRadient-INformed MoE
☆264Updated last year
QuixiAI / grokadamw
☆136Updated last year
TRI-ML / linear_open_lm
A repository for research on medium sized language models.
☆78Updated last year
SakanaAI / evo-memory
Code to train and evaluate Neural Attention Memory Models to obtain universally-applicable memory systems for transformers.
☆329Updated last year
joey00072 / ohara
Collection of autoregressive model implementation
☆86Updated 7 months ago
RobertCsordas / moe_attention
Official repository for the paper "SwitchHead: Accelerating Transformers with Mixture-of-Experts Attention"
☆102Updated last year
Aleph-Alpha-Research / trigrams
☆58Updated 2 weeks ago
tiiuae / onebitllms
Lightweight toolkit package to train and fine-tune 1.58bit Language models
☆99Updated 6 months ago