xjdr-alt / simple_transformerLinks

Simple Transformer in Jax

☆139

Alternatives and similar repositories for simple_transformer

Users that are interested in simple_transformer are comparing it to the libraries listed below

Sorting:

SinatrasC / entropix-smollm
smolLM with Entropix sampler on pytorch
☆149Updated last year
xjdr-alt / entropix-local
smol models are fun too
☆92Updated last year
JoeLi12345 / nGPT
an open source reproduction of NVIDIA's nGPT (Normalized Transformer with Representation Learning on the Hypersphere)
☆108Updated 8 months ago
PrimeIntellect-ai / pi-quant
SIMD quantization kernels
☆92Updated 2 months ago
yacineMTB / just-large-models
Just large language models. Hackable, with as little abstraction as possible. Done for my own purposes, feel free to rip.
☆44Updated 2 years ago
xjdr-alt / llmri
look how they massacred my boy
☆63Updated last year
joey00072 / Tinytorch
A really tiny autograd engine
☆96Updated 6 months ago
VatsaDev / NanoPoor
NanoGPT-speedrunning for the poor T4 enjoyers
☆73Updated 7 months ago
doomslide / attention-graph
A graph visualization of attention
☆57Updated 6 months ago
doomslide / hyperobject
Plotting (entropy, varentropy) for small LMs
☆99Updated 6 months ago
nano-R1 / resources
Compiling useful links, papers, benchmarks, ideas, etc.
☆45Updated 8 months ago
divyamakkar0 / JAXformer
A zero-to-one guide on scaling modern transformers with n-dimensional parallelism.
☆105Updated 2 months ago
haraschax / nograd
Gradient descent is cool and all, but what if we could delete it?
☆104Updated 3 months ago
jerber / lang-jepa
☆128Updated 11 months ago
okarthikb / state-space-models
☆28Updated last year
Pleias / Quest-Best-Tokens
An introduction to LLM Sampling
☆79Updated 11 months ago
JD-P / minihf
MiniHF is an inference, human preference data collection, and fine-tuning tool for local language models. It is intended to help the user…
☆182Updated last month
neoneye / ARC-Interactive-History-Dataset
The history files when recording human interaction while solving ARC tasks
☆118Updated 3 weeks ago
tokenbender / avataRL
rl from zero pretrain, can it be done? yes.
☆281Updated 2 months ago
srush / GPTWorld
A puzzle to learn about prompting
☆135Updated 2 years ago
LeonGuertler / UnstableBaselines
☆107Updated this week
gautierdag / bpeasy
Fast bare-bones BPE for modern tokenizer training
☆171Updated 5 months ago
strangeloopcanon / LLMRank
PageRank for LLMs
☆51Updated 2 months ago
bloc97 / DeMo
DeMo: Decoupled Momentum Optimization
☆197Updated last year
xjdr-alt / muzero_sketch
☆40Updated last year
rgreenblatt / arc_draw_more_samples_pub
Draw more samples
☆196Updated last year
HazyResearch / cartridges
Storing long contexts in tiny caches with self-study
☆218Updated last month
Laz4rz / GPT-2
Following Karpathy with GPT-2 implementation and training, writing lots of comments cause I have memory of a goldfish
☆172Updated last year
Figura-Labs-Inc / telegraf_nv_export
Ultra low overhead NVIDIA GPU telemetry plugin for telegraf with memory temperature readings.
☆63Updated last year
justinchiu / openlogprobs
Extract full next-token probabilities via language model APIs
☆248Updated last year