NVIDIA / ngptLinks

Normalized Transformer (nGPT)

☆194

Alternatives and similar repositories for ngpt

Users that are interested in ngpt are comparing it to the libraries listed below

Sorting:

lucidrains / nGPT-pytorch
Quick implementation of nGPT, learning entirely on the hypersphere, from NvidiaAI
☆293Updated 6 months ago
Zyphra / tree_attention
Tree Attention: Topology-aware Decoding for Long-Context Attention on GPU clusters
☆130Updated last year
HazyResearch / lolcats
Repo for "LoLCATs: On Low-Rank Linearizing of Large Language Models"
☆249Updated 10 months ago
jzhang38 / LongMamba
Some preliminary explorations of Mamba's context scaling.
☆217Updated last year
HazyResearch / zoology
Understand and test language model architectures on synthetic tasks.
☆240Updated 2 months ago
nil0x9 / flash-muon
Flash-Muon: An Efficient Implementation of Muon Optimizer
☆212Updated 5 months ago
EleutherAI / nanoGPT-mup
The simplest, fastest repository for training/finetuning medium-sized GPTs.
☆174Updated 5 months ago
HazyResearch / based
Code for exploring Based models from "Simple linear attention language models balance the recall-throughput tradeoff"
☆243Updated 5 months ago
athms / mad-lab
A MAD laboratory to improve AI architecture designs 🧪
☆135Updated 11 months ago
lucidrains / PEER-pytorch
Pytorch implementation of the PEER block from the paper, Mixture of A Million Experts, by Xu Owen He at Deepmind
☆131Updated last month
NX-AI / flashrnn
FlashRNN - Fast RNN Kernels with I/O Awareness
☆169Updated last month
HanGuo97 / log-linear-attention
☆256Updated 5 months ago
NVlabs / GatedDeltaNet
[ICLR 2025] Official PyTorch Implementation of Gated Delta Networks: Improving Mamba2 with Delta Rule
☆379Updated 2 months ago
apple / ml-sigmoid-attention
☆303Updated 7 months ago
facebookresearch / PhysicsLM4
Physics of Language Models, Part 4
☆262Updated 4 months ago
jxiw / MambaInLlama
[NeurIPS 2024] Official Repository of The Mamba in the Llama: Distilling and Accelerating Hybrid Models
☆232Updated last month
bloc97 / DeMo
DeMo: Decoupled Momentum Optimization
☆197Updated last year
facebookresearch / memory
Memory layers use a trainable key-value lookup mechanism to add extra parameters to a model without increasing FLOPs. Conceptually, spars…
☆360Updated 11 months ago
BlinkDL / modded-nanogpt-rwkv
RWKV-7: Surpassing GPT
☆101Updated last year
lucidrains / coconut-pytorch
Implementation of 🥥 Coconut, Chain of Continuous Thought, in Pytorch
☆180Updated 5 months ago
Zyphra / Zamba2
PyTorch implementation of models from the Zamba2 series.
☆186Updated 10 months ago
huggingface / picotron_tutorial
☆224Updated last week
kyleliang919 / Super_Muon
☆66Updated 8 months ago
llm-random / llm-random
☆205Updated this week
zyushun / Adam-mini
Code for Adam-mini: Use Fewer Learning Rates To Gain More https://arxiv.org/abs/2406.16793
☆445Updated 6 months ago
fla-org / flame
🔥 A minimal training framework for scaling FLA models
☆311Updated 2 weeks ago
martin-marek / batch-size
📄Small Batch Size Training for Language Models
☆64Updated last month
thinking-machines-lab / manifolds
Supporting code for the blog post on modular manifolds.
☆103Updated 2 months ago
PiotrNawrot / sparse-frontier
The evaluation framework for training-free sparse attention in LLMs
☆106Updated last month
proger / accelerated-scan
Accelerated First Order Parallel Associative Scan
☆192Updated last year