xjdr-alt / simple_transformer
Simple Transformer in Jax
☆136Updated 8 months ago
Alternatives and similar repositories for simple_transformer:
Users that are interested in simple_transformer are comparing it to the libraries listed below
- smolLM with Entropix sampler on pytorch☆150Updated 4 months ago
- smol models are fun too☆88Updated 3 months ago
- look how they massacred my boy☆63Updated 4 months ago
- ☆97Updated 4 months ago
- ☆100Updated 2 months ago
- Just large language models. Hackable, with as little abstraction as possible. Done for my own purposes, feel free to rip.☆44Updated last year
- supporting pytorch FSDP for optimizers☆77Updated 2 months ago
- Gradient descent is cool and all, but what if we could delete it?☆103Updated this week
- The history files when recording human interaction while solving ARC tasks☆97Updated last week
- The simplest, fastest repository for training/finetuning medium-sized GPTs.☆99Updated 3 months ago
- ☆38Updated 7 months ago
- Following master Karpathy with GPT-2 implementation and training, writing lots of comments cause I have memory of a goldfish☆168Updated 7 months ago
- MLX port for xjdr's entropix sampler (mimics jax implementation)☆63Updated 3 months ago
- MiniHF is an inference, human preference data collection, and fine-tuning tool for local language models. It is intended to help the user…☆165Updated this week
- seqax = sequence modeling + JAX☆145Updated this week
- compute, storage, and networking infra at home☆64Updated last year
- Draw more samples☆186Updated 8 months ago
- ☆27Updated 7 months ago
- DeMo: Decoupled Momentum Optimization☆181Updated 3 months ago
- A puzzle to learn about prompting☆124Updated last year
- Entropy Based Sampling and Parallel CoT Decoding☆17Updated 4 months ago
- A really tiny autograd engine☆89Updated 10 months ago
- ☆122Updated 2 weeks ago
- An introduction to LLM Sampling☆75Updated 2 months ago
- Comprehensive analysis of difference in performance of QLora, Lora, and Full Finetunes.☆82Updated last year
- ☆60Updated last month
- Ultra low overhead NVIDIA GPU telemetry plugin for telegraf with memory temperature readings.☆63Updated 7 months ago
- Extract full next-token probabilities via language model APIs☆231Updated last year