JoeLi12345 / nGPTLinks

an open source reproduction of NVIDIA's nGPT (Normalized Transformer with Representation Learning on the Hypersphere)

☆108

Alternatives and similar repositories for nGPT

Users that are interested in nGPT are comparing it to the libraries listed below

Sorting:

VatsaDev / NanoPoor
NanoGPT-speedrunning for the poor T4 enjoyers
☆73Updated 7 months ago
xjdr-alt / llmri
look how they massacred my boy
☆63Updated last year
bloc97 / DeMo
DeMo: Decoupled Momentum Optimization
☆197Updated last year
casper-hansen / OpenCoconut
OpenCoconut implements a latent reasoning paradigm where we generate thoughts before decoding.
☆173Updated 10 months ago
SinatrasC / entropix-smollm
smolLM with Entropix sampler on pytorch
☆149Updated last year
leloykun / modded-nanogpt
NanoGPT (124M) quality in 2.67B tokens
☆28Updated 2 months ago
OpenEvaByte / evabyte
EvaByte: Efficient Byte-level Language Models at Scale
☆111Updated 7 months ago
s-smits / grpo-optuna
Optimizing Causal LMs through GRPO with weighted reward functions and automated hyperparameter tuning using Optuna
☆59Updated last month
joey00072 / ohara
Collection of autoregressive model implementation
☆86Updated 7 months ago
joey00072 / Multi-Head-Latent-Attention-MLA-
working implimention of deepseek MLA
☆45Updated 10 months ago
QuixiAI / grokadamw
☆136Updated last year
Pleias / Quest-Best-Tokens
An introduction to LLM Sampling
☆79Updated 11 months ago
microsoft / ArchScale
Simple & Scalable Pretraining for Neural Architecture Research
☆304Updated last month
OpenPipe / deductive-reasoning
Train your own SOTA deductive reasoning model
☆107Updated 8 months ago
main-horse / hnet-old
H-Net Dynamic Hierarchical Architecture
☆80Updated 2 months ago
N8python / mlx-pretrain
A simple MLX implementation for pretraining LLMs on Apple Silicon.
☆84Updated 3 months ago
HazyResearch / cartridges
Storing long contexts in tiny caches with self-study
☆218Updated last month
BlinkDL / modded-nanogpt-rwkv
RWKV-7: Surpassing GPT
☆101Updated last year
xjdr-alt / simple_transformer
Simple Transformer in Jax
☆139Updated last year
kubernetes-bad / reward-composer
Lego for GRPO
☆30Updated 6 months ago
okarthikb / state-space-models
☆28Updated last year
tokenbender / avataRL
rl from zero pretrain, can it be done? yes.
☆281Updated 2 months ago
nano-R1 / resources
Compiling useful links, papers, benchmarks, ideas, etc.
☆45Updated 8 months ago
Zyphra / Zamba2
PyTorch implementation of models from the Zamba2 series.
☆186Updated 10 months ago
brendanhogan / DeepSeekRL-Extended
Exploring Applications of GRPO
☆249Updated 3 months ago
SinatrasC / entropix
Entropy Based Sampling and Parallel CoT Decoding
☆17Updated last year
evanatyourservice / llm-jax
Train a SmolLM-style llm on fineweb-edu in JAX/Flax with an assortment of optimizers.
☆18Updated 4 months ago
facebookresearch / llm-speedrunner
The Automated LLM Speedrunning Benchmark measures how well LLM agents can reproduce previous innovations and discover new ones in languag…
☆112Updated last month
HazyResearch / lolcats
Repo for "LoLCATs: On Low-Rank Linearizing of Large Language Models"
☆249Updated 10 months ago
doomslide / hyperobject
Plotting (entropy, varentropy) for small LMs
☆99Updated 6 months ago