seungrokj / ai_sprint_parisLinks
☆14Updated 2 months ago
Alternatives and similar repositories for ai_sprint_paris
Users that are interested in ai_sprint_paris are comparing it to the libraries listed below
Sorting:
- PyTorch Single Controller☆423Updated this week
- Attention Kernels for Symmetric Power Transformers☆115Updated 3 weeks ago
- seqax = sequence modeling + JAX☆167Updated 2 months ago
- PCCL (Prime Collective Communications Library) implements fault tolerant collective communications over IP☆121Updated 2 weeks ago
- Small scale distributed training of sequential deep learning models, built on Numpy and MPI.☆142Updated last year
- Write a fast kernel and run it on Discord. See how you compare against the best!☆57Updated this week
- How to ensure correctness and ship LLM generated kernels in PyTorch☆58Updated last week
- A FlashAttention implementation for JAX with support for efficient document mask computation and context parallelism.☆141Updated 5 months ago
- Distributed pretraining of large language models (LLMs) on cloud TPU slices, with Jax and Equinox.☆24Updated 11 months ago
- Experiment of using Tangent to autodiff triton☆81Updated last year
- Minimal yet performant LLM examples in pure JAX☆160Updated this week
- A zero-to-one guide on scaling modern transformers with n-dimensional parallelism.☆91Updated 3 weeks ago
- ☆224Updated 3 months ago
- Automatic differentiation for Triton Kernels☆11Updated last month
- A bunch of kernels that might make stuff slower 😉☆59Updated this week
- SIMD quantization kernels☆87Updated 2 weeks ago
- Evaluating Large Language Models for CUDA Code Generation ComputeEval is a framework designed to generate and evaluate CUDA code from Lar…☆66Updated 3 months ago
- High-Performance SGEMM on CUDA devices☆101Updated 8 months ago
- NSA Triton Kernels written with GPT5 and Opus 4.1☆65Updated last month
- A JAX-native LLM Post-Training Library☆150Updated this week
- Scalable and Stable Parallelization of Nonlinear RNNS☆22Updated 3 weeks ago
- 🧱 Modula software package☆239Updated last month
- ☆281Updated last year
- Custom triton kernels for training Karpathy's nanoGPT.☆19Updated 11 months ago
- 📄Small Batch Size Training for Language Models☆62Updated 3 weeks ago
- Learn CUDA with PyTorch☆84Updated this week
- Latent Program Network (from the "Searching Latent Program Spaces" paper)☆98Updated 6 months ago
- Experimental GPU language with meta-programming☆23Updated last year
- Tree Attention: Topology-aware Decoding for Long-Context Attention on GPU clusters☆129Updated 9 months ago
- A parallel framework for training deep neural networks☆63Updated 6 months ago