bhosmer / mm
☆123Updated 4 months ago
Alternatives and similar repositories for mm:
Users that are interested in mm are comparing it to the libraries listed below
- Experiment of using Tangent to autodiff triton☆78Updated last year
- ☆215Updated 8 months ago
- Solve puzzles. Learn CUDA.☆63Updated last year
- ☆76Updated 8 months ago
- An interactive exploration of Transformer programming.☆261Updated last year
- Resources from the EleutherAI Math Reading Group☆53Updated 3 weeks ago
- A puzzle to learn about prompting☆124Updated last year
- σ-GPT: A New Approach to Autoregressive Models☆62Updated 7 months ago
- ☆165Updated last year
- Automatic gradient descent☆207Updated last year
- ☆87Updated last year
- A MAD laboratory to improve AI architecture designs 🧪☆108Updated 3 months ago
- The simplest, fastest repository for training/finetuning medium-sized GPTs.☆100Updated 4 months ago
- Losslessly encode text natively with arithmetic coding and HuggingFace Transformers☆73Updated 7 months ago
- Minimal (400 LOC) implementation Maximum (multi-node, FSDP) GPT training☆122Updated 11 months ago
- Implementation of Flash Attention in Jax☆206Updated last year
- ☆220Updated last month
- Token Omission Via Attention☆124Updated 5 months ago
- Alex Krizhevsky's original code from Google Code☆190Updated 9 years ago
- ring-attention experiments☆128Updated 5 months ago
- Gpu benchmark☆55Updated last month
- Latent Diffusion Language Models☆68Updated last year
- Home for "How To Scale Your Model", a short blog-style textbook about scaling LLMs on TPUs☆221Updated this week
- Puzzles for exploring transformers☆335Updated last year
- Tools for working with the Abstraction & Reasoning Corpus☆180Updated 7 months ago
- Code to reproduce "Transformers Can Do Arithmetic with the Right Embeddings", McLeish et al (NeurIPS 2024)☆186Updated 9 months ago
- seqax = sequence modeling + JAX☆150Updated last week
- Understand and test language model architectures on synthetic tasks.☆185Updated 2 weeks ago
- ☆197Updated this week
- Tree Attention: Topology-aware Decoding for Long-Context Attention on GPU clusters☆125Updated 3 months ago