catswe / LinearKANLinks
LinearKAN: A very fast implementation of Kolmogorov-Arnold Networks
☆17Updated last month
Alternatives and similar repositories for LinearKAN
Users that are interested in LinearKAN are comparing it to the libraries listed below
Sorting:
- ☆16Updated last year
- General Matrix Multiplication using NVIDIA Tensor Cores☆27Updated 11 months ago
- making the official triton tutorials actually comprehensible☆85Updated 4 months ago
- Competitive GPU kernel optimization platform.☆144Updated last week
- A FlashAttention implementation for JAX with support for efficient document mask computation and context parallelism.☆155Updated 2 months ago
- Learnings and programs related to CUDA☆432Updated 6 months ago
- ☆537Updated 5 months ago
- Solve puzzles to improve your tinygrad skills!☆175Updated 2 months ago
- Learning about CUDA by writing PTX code.☆151Updated last year
- Minimal yet performant LLM examples in pure JAX☆225Updated last week
- Implementation of Diffusion Transformer (DiT) in JAX☆300Updated last year
- ☆88Updated 2 months ago
- 6.790 | Machine Learning | Draft Site/Notes☆14Updated last month
- Flax (Jax) implementation of DeepSeek-R1-Distill-Qwen-1.5B with weights ported from Hugging Face.☆26Updated 10 months ago
- ☆33Updated last year
- Home for "How To Scale Your Model", a short blog-style textbook about scaling LLMs on TPUs☆801Updated this week
- Tensor library with autograd using only Rust's standard library☆71Updated last year
- Mapping out the "memory" of neural nets with data attribution☆37Updated this week
- ☆287Updated last year
- A repository to unravel the language of GPUs, making their kernel conversations easy to understand☆195Updated 7 months ago
- small auto-grad engine inspired from Karpathy's micrograd and PyTorch☆276Updated last year
- For optimization algorithm research and development.☆556Updated 3 weeks ago
- ☆233Updated last year
- CIFAR-10 speedruns: 94% in 2.6 seconds and 96% in 27 seconds☆340Updated last month
- Minimal JAX implementation unifying Diffusion and Flow Matching algorithms as alternative strategies for transporting data distributions.☆61Updated 3 weeks ago
- A curated list of resources for learning and exploring Triton, OpenAI's programming language for writing efficient GPU code.☆451Updated 10 months ago
- ☆461Updated last year
- An implementation of the transformer architecture onto an Nvidia CUDA kernel☆202Updated 2 years ago
- ☆408Updated 9 months ago
- ☆91Updated last year