xjdr-alt / simple_transformer
Simple Transformer in Jax
☆130Updated 7 months ago
Alternatives and similar repositories for simple_transformer:
Users that are interested in simple_transformer are comparing it to the libraries listed below
- look how they massacred my boy☆63Updated 3 months ago
- smolLM with Entropix sampler on pytorch☆148Updated 2 months ago
- The simplest, fastest repository for training/finetuning medium-sized GPTs.☆91Updated 2 months ago
- smol models are fun too☆86Updated 2 months ago
- ☆96Updated 3 months ago
- DeMo: Decoupled Momentum Optimization☆171Updated last month
- supporting pytorch FSDP for optimizers☆75Updated last month
- A puzzle to learn about prompting☆123Updated last year
- The history files when recording human interaction while solving ARC tasks☆96Updated this week
- MLX port for xjdr's entropix sampler (mimics jax implementation)☆62Updated 2 months ago
- Just large language models. Hackable, with as little abstraction as possible. Done for my own purposes, feel free to rip.☆44Updated last year
- Aidan Bench attempts to measure <big_model_smell> in LLMs.☆206Updated last week
- Comprehensive analysis of difference in performance of QLora, Lora, and Full Finetunes.☆82Updated last year
- ☆98Updated last month
- ☆37Updated 6 months ago
- An introduction to LLM Sampling☆75Updated last month
- Entropy Based Sampling and Parallel CoT Decoding☆17Updated 3 months ago
- Modify Entropy Based Sampling to work with Mac Silicon via MLX☆49Updated 2 months ago
- ☆203Updated 6 months ago
- train entropix like a champ!☆19Updated 3 months ago
- Following master Karpathy with GPT-2 implementation and training, writing lots of comments cause I have memory of a goldfish☆167Updated 5 months ago
- Minimal (400 LOC) implementation Maximum (multi-node, FSDP) GPT training☆121Updated 9 months ago
- A really tiny autograd engine☆89Updated 9 months ago
- Code to reproduce "Transformers Can Do Arithmetic with the Right Embeddings", McLeish et al (NeurIPS 2024)☆183Updated 8 months ago
- A MAD laboratory to improve AI architecture designs 🧪☆102Updated last month
- MiniHF is an inference, human preference data collection, and fine-tuning tool for local language models. It is intended to help the user…☆163Updated this week
- ☆60Updated last year
- compute, storage, and networking infra at home☆64Updated 11 months ago
- Turing machines, Rule 110, and A::B reversal using Claude 3 Opus.☆59Updated 8 months ago