lucidrains / nGPT-pytorchLinks

Quick implementation of nGPT, learning entirely on the hypersphere, from NvidiaAI

☆288

Alternatives and similar repositories for nGPT-pytorch

Users that are interested in nGPT-pytorch are comparing it to the libraries listed below

Sorting:

NVIDIA / ngpt
Normalized Transformer (nGPT)
☆185Updated 8 months ago
NVlabs / GatedDeltaNet
[ICLR 2025] Official PyTorch Implementation of Gated Delta Networks: Improving Mamba2 with Delta Rule
☆192Updated 4 months ago
goombalab / hnet
H-Net: Hierarchical Network with Dynamic Chunking
☆593Updated 3 weeks ago
apple / ml-sigmoid-attention
☆293Updated 3 months ago
jzhang38 / LongMamba
Some preliminary explorations of Mamba's context scaling.
☆216Updated last year
bluorion-com / ZClip
Official implementation of the paper: "ZClip: Adaptive Spike Mitigation for LLM Pre-Training".
☆131Updated last month
zyushun / Adam-mini
Code for Adam-mini: Use Fewer Learning Rates To Gain More https://arxiv.org/abs/2406.16793
☆431Updated 2 months ago
dvruette / gidd
Code accompanying the paper "Generalized Interpolating Discrete Diffusion"
☆97Updated last month
HazyResearch / lolcats
Repo for "LoLCATs: On Low-Rank Linearizing of Large Language Models"
☆244Updated 6 months ago
lucidrains / ring-attention-pytorch
Implementation of 💍 Ring Attention, from Liu et al. at Berkeley AI, in Pytorch
☆532Updated 2 months ago
lucidrains / hyper-connections
Attempt to make multiple residual streams from Bytedance's Hyper-Connections paper accessible to the public
☆88Updated last month
srush / annotated-mamba
Annotated version of the Mamba paper
☆487Updated last year
HanGuo97 / log-linear-attention
☆228Updated last month
HazyResearch / based
Code for exploring Based models from "Simple linear attention language models balance the recall-throughput tradeoff"
☆238Updated last month
NVlabs / hymba
☆190Updated 7 months ago
jxiw / MambaInLlama
[NeurIPS 2024] Official Repository of The Mamba in the Llama: Distilling and Accelerating Hybrid Models
☆226Updated 3 months ago
kvfrans / jax-diffusion-transformer
Implementation of Diffusion Transformer (DiT) in JAX
☆280Updated last year
lucidrains / PEER-pytorch
Pytorch implementation of the PEER block from the paper, Mixture of A Million Experts, by Xu Owen He at Deepmind
☆127Updated 11 months ago
HazyResearch / zoology
Understand and test language model architectures on synthetic tasks.
☆221Updated 3 weeks ago
HomebrewML / HeavyBall
Efficient optimizers
☆252Updated last week
nanowell / AdEMAMix-Optimizer-Pytorch
The AdEMAMix Optimizer: Better, Faster, Older.
☆184Updated 10 months ago
llm-random / llm-random
☆192Updated last week
bloc97 / DeMo
DeMo: Decoupled Momentum Optimization
☆190Updated 8 months ago
Haiyang-W / TokenFormer
[ICLR2025 Spotlight🔥] Official Implementation of TokenFormer: Rethinking Transformer Scaling with Tokenized Model Parameters
☆567Updated 5 months ago
proger / hippogriff
Griffin MQA + Hawk Linear RNN Hybrid
☆88Updated last year
Zyphra / tree_attention
Tree Attention: Topology-aware Decoding for Long-Context Attention on GPU clusters
☆127Updated 8 months ago
EleutherAI / nanoGPT-mup
The simplest, fastest repository for training/finetuning medium-sized GPTs.
☆149Updated last month
dingo-actual / infini-transformer
PyTorch implementation of Infini-Transformer from "Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention…
☆290Updated last year
athms / mad-lab
A MAD laboratory to improve AI architecture designs 🧪
☆123Updated 7 months ago
cloneofsimo / scaling-guide
WIP
☆93Updated 11 months ago