xjdr-alt / simple_transformer
Simple Transformer in Jax
☆119Updated 5 months ago
Related projects ⓘ
Alternatives and complementary repositories for simple_transformer
- look how they massacred my boy☆58Updated last month
- Aidan Bench attempts to measure <big_model_smell> in LLMs.☆100Updated this week
- ☆95Updated last month
- smolLM with Entropix sampler on pytorch☆139Updated 3 weeks ago
- Just large language models. Hackable, with as little abstraction as possible. Done for my own purposes, feel free to rip.☆44Updated last year
- A puzzle to learn about prompting☆122Updated last year
- smol models are fun too☆77Updated 2 weeks ago
- MLX port for xjdr's entropix sampler (mimics jax implementation)☆56Updated 2 weeks ago
- The history files when recording human interaction while solving ARC tasks☆95Updated this week
- MiniHF is an inference, human preference data collection, and fine-tuning tool for local language models. It is intended to help the user…☆152Updated this week
- The simplest, fastest repository for training/finetuning medium-sized GPTs.☆84Updated this week
- Following master Karpathy with GPT-2 implementation and training, writing lots of comments cause I have memory of a goldfish☆167Updated 3 months ago
- Normalized Transformer (nGPT)☆87Updated this week
- Extract full next-token probabilities via language model APIs☆229Updated 9 months ago
- Helpers and such for working with Lambda Cloud☆51Updated last year
- Turing machines, Rule 110, and A::B reversal using Claude 3 Opus.☆60Updated 6 months ago
- code for training & evaluating Contextual Document Embedding models☆119Updated this week
- seqax = sequence modeling + JAX☆134Updated 4 months ago
- ☆27Updated 4 months ago
- An introduction to LLM Sampling☆65Updated 2 weeks ago
- $100K or 100 Days: Trade-offs when Pre-Training with Academic Resources☆95Updated 3 weeks ago
- ☆24Updated 7 months ago
- Long context evaluation for large language models☆190Updated this week
- Comprehensive analysis of difference in performance of QLora, Lora, and Full Finetunes.☆81Updated last year
- ☆20Updated 3 weeks ago
- papers.day☆79Updated 11 months ago
- A really tiny autograd engine☆87Updated 7 months ago
- ☆197Updated 4 months ago
- Muon optimizer for neural networks: >30% extra sample efficiency, <3% wallclock overhead☆121Updated this week
- Gradient descent is cool and all, but what if we could delete it?☆102Updated last week