karpathy / rustbpeLinks
The missing tiktoken training code
☆162Updated this week
Alternatives and similar repositories for rustbpe
Users that are interested in rustbpe are comparing it to the libraries listed below
Sorting:
- FlexAttention based, minimal vllm-style inference engine for fast Gemma 2 inference.☆329Updated 2 months ago
- Simple Byte pair Encoding mechanism used for tokenization process . written purely in C☆142Updated last year
- SIMD quantization kernels☆93Updated 4 months ago
- Tensor library with autograd using only Rust's standard library☆71Updated last year
- Load compute kernels from the Hub☆357Updated 3 weeks ago
- Simple MPI implementation for prototyping or learning☆297Updated 5 months ago
- Quantized LLM training in pure CUDA/C++.☆230Updated this week
- Dion optimizer algorithm☆413Updated this week
- MoE training for Me and You and maybe other people☆315Updated this week
- Where GPUs get cooked 👩🍳🔥☆345Updated 3 months ago
- A lightweight, local-first, and 🆓 experiment tracking library from Hugging Face 🤗☆1,191Updated this week
- (WIP) A small but powerful, homemade PyTorch from scratch.☆666Updated last week
- Alex Krizhevsky's original code from Google Code☆198Updated 9 years ago
- Async RL Training at Scale☆976Updated this week
- UNet diffusion model in pure CUDA☆659Updated last year
- Implementation of Diffusion Transformer (DiT) in JAX☆300Updated last year
- 👷 Build compute kernels☆198Updated 2 weeks ago
- A repository to unravel the language of GPUs, making their kernel conversations easy to understand☆195Updated 7 months ago
- ☆537Updated 5 months ago
- For optimization algorithm research and development.☆556Updated 2 weeks ago
- Fast bare-bones BPE for modern tokenizer training☆174Updated 6 months ago
- Simple & Scalable Pretraining for Neural Architecture Research☆306Updated last month
- An implementation of the transformer architecture onto an Nvidia CUDA kernel☆202Updated 2 years ago
- ☆178Updated last year
- Learnings and programs related to CUDA☆432Updated 6 months ago
- Learning about CUDA by writing PTX code.☆151Updated last year
- Deep learning for dummies. All the practical details and useful utilities that go into working with real models.☆829Updated 5 months ago
- an open source reproduction of NVIDIA's nGPT (Normalized Transformer with Representation Learning on the Hypersphere)☆109Updated 10 months ago
- Best practices & guides on how to write distributed pytorch training code☆562Updated 2 months ago
- Complete solutions to the Programming Massively Parallel Processors Edition 4☆630Updated 6 months ago