xjdr-alt / simple_transformer
Simple Transformer in Jax
☆136Updated 10 months ago
Alternatives and similar repositories for simple_transformer:
Users that are interested in simple_transformer are comparing it to the libraries listed below
- smol models are fun too☆92Updated 5 months ago
- The history files when recording human interaction while solving ARC tasks☆106Updated this week
- an open source reproduction of NVIDIA's nGPT (Normalized Transformer with Representation Learning on the Hypersphere)☆96Updated last month
- smolLM with Entropix sampler on pytorch☆151Updated 5 months ago
- ☆71Updated this week
- Just large language models. Hackable, with as little abstraction as possible. Done for my own purposes, feel free to rip.☆44Updated last year
- ☆55Updated last month
- look how they massacred my boy☆63Updated 6 months ago
- ☆97Updated 6 months ago
- Compiling useful links, papers, benchmarks, ideas, etc.☆42Updated last month
- A puzzle to learn about prompting☆127Updated last year
- ☆108Updated 4 months ago
- ☆38Updated 8 months ago
- NanoGPT-speedrunning for the poor T4 enjoyers☆62Updated this week
- This repository contain the simple llama3 implementation in pure jax.☆63Updated 2 months ago
- A really tiny autograd engine☆92Updated last year
- ☆27Updated 9 months ago
- supporting pytorch FSDP for optimizers☆80Updated 4 months ago
- An introduction to LLM Sampling☆77Updated 4 months ago
- A MAD laboratory to improve AI architecture designs 🧪☆111Updated 4 months ago
- Following master Karpathy with GPT-2 implementation and training, writing lots of comments cause I have memory of a goldfish☆174Updated 8 months ago
- Gradient descent is cool and all, but what if we could delete it?☆103Updated last week
- Draw more samples☆189Updated 10 months ago
- The simplest, fastest repository for training/finetuning medium-sized GPTs.☆105Updated 5 months ago
- Extract full next-token probabilities via language model APIs☆241Updated last year
- MLX port for xjdr's entropix sampler (mimics jax implementation)☆64Updated 5 months ago
- ☆215Updated 9 months ago
- seqax = sequence modeling + JAX☆154Updated 2 weeks ago
- compute, storage, and networking infra at home☆65Updated last year
- Comprehensive analysis of difference in performance of QLora, Lora, and Full Finetunes.☆82Updated last year