catswe / LinearKANLinks
LinearKAN: A very fast implementation of Kolmogorov-Arnold Networks
☆18Updated 2 months ago
Alternatives and similar repositories for LinearKAN
Users that are interested in LinearKAN are comparing it to the libraries listed below
Sorting:
- ☆89Updated 3 months ago
- Dion optimizer algorithm☆431Updated 3 weeks ago
- Quantized LLM training in pure CUDA/C++.☆238Updated 3 weeks ago
- A FlashAttention implementation for JAX with support for efficient document mask computation and context parallelism.☆158Updated 2 months ago
- Minimal yet performant LLM examples in pure JAX☆240Updated 3 weeks ago
- SIMD quantization kernels☆94Updated 5 months ago
- 🧱 Modula software package☆322Updated 5 months ago
- A zero-to-one guide on scaling modern transformers with n-dimensional parallelism.☆115Updated last month
- CIFAR-10 speedruns: 94% in 2.6 seconds and 96% in 27 seconds☆352Updated 2 months ago
- JAX-Toolbox☆382Updated this week
- making the official triton tutorials actually comprehensible☆111Updated 5 months ago
- Implementation of Diffusion Transformer (DiT) in JAX☆306Updated last year
- ☆28Updated 4 months ago
- ☆544Updated 6 months ago
- ☆291Updated last year
- ☆246Updated last year
- coding CUDA everyday!☆73Updated this week
- mHC kernels implemented in CUDA☆249Updated 3 weeks ago
- A repository to unravel the language of GPUs, making their kernel conversations easy to understand☆198Updated 8 months ago
- PTX-Tutorial Written Purely By AIs (Deep Research of Openai and Claude 3.7)☆66Updated 10 months ago
- Learning about CUDA by writing PTX code.☆152Updated last year
- NanoGPT-speedrunning for the poor T4 enjoyers☆73Updated 9 months ago
- A curated list of resources for learning and exploring Triton, OpenAI's programming language for writing efficient GPU code.☆457Updated 11 months ago
- all the materials for cs140e winter 2026☆33Updated this week
- ☆562Updated last year
- Supporting code for the blog post on modular manifolds.☆115Updated 4 months ago
- Small scale distributed training of sequential deep learning models, built on Numpy and MPI.☆155Updated 2 years ago
- General Matrix Multiplication using NVIDIA Tensor Cores☆28Updated last year
- minimal Energy-based transformer☆43Updated last month
- Accelerated First Order Parallel Associative Scan☆196Updated last month