ashvardanian / cuda-python-starter-kit
Parallel Computing starter project to build GPU & CPU kernels in CUDA & C++ and call them from Python without a single line of CMake using PyBind11
☆21Updated 3 weeks ago
Alternatives and similar repositories for cuda-python-starter-kit:
Users that are interested in cuda-python-starter-kit are comparing it to the libraries listed below
- A list of awesome resources and blogs on topics related to Unum☆37Updated 5 months ago
- Learning how to write "Less Slow" code in Python, from numerical micro-kernels to coroutines, ranges, and polymorphic state machines☆30Updated last week
- ☆12Updated last year
- LLM training in simple, raw C/CUDA☆92Updated 11 months ago
- A FlashAttention implementation for JAX with support for efficient document mask computation and context parallelism.☆85Updated last month
- ☆21Updated last month
- High-Performance SGEMM on CUDA devices☆88Updated 2 months ago
- A user-friendly tool chain that enables the seamless execution of ONNX models using JAX as the backend.☆109Updated this week
- Implementation of the paper "Lossless Compression of Vector IDs for Approximate Nearest Neighbor Search" by Severo et al.☆75Updated 2 months ago
- Example ML projects that use the Determined library.☆30Updated 6 months ago
- ScalarLM - a unified training and inference stack☆31Updated last week
- Small scale distributed training of sequential deep learning models, built on Numpy and MPI.☆127Updated last year
- Efficient BM25 with DuckDB 🦆☆44Updated 3 months ago
- NLP with Rust for Python 🦀🐍☆61Updated 10 months ago
- A repository to unravel the language of GPUs, making their kernel conversations easy to understand☆169Updated last week
- Rust Implementation of micrograd☆51Updated 9 months ago
- Notes and artifacts from the ONNX steering committee☆25Updated this week
- Learn CUDA with PyTorch☆19Updated 2 months ago
- A parallel framework for training deep neural networks☆57Updated 2 weeks ago
- Write a fast kernel and run it on Discord. See how you compare against the best!☆35Updated this week
- ☆11Updated 2 months ago
- Binary vector search example using Unum's USearch engine and pre-computed Wikipedia embeddings from Co:here and MixedBread☆18Updated 11 months ago
- GPU Environment Management for Visual Studio Code☆37Updated last year
- A stand-alone implementation of several NumPy dtype extensions used in machine learning.☆255Updated this week
- Inference Llama 2 in C++☆44Updated 11 months ago
- Awesome utilities for performance profiling☆169Updated 3 weeks ago
- No-GIL Python environment featuring NVIDIA Deep Learning libraries.☆53Updated 3 weeks ago
- ☆14Updated last month
- ☆30Updated last week
- Reference Kernels for the Leaderboard☆23Updated last month