ashvardanian / cuda-python-starter-kit
Parallel Computing starter project to build GPU & CPU kernels in CUDA & C++ and call them from Python without a single line of CMake using PyBind11
☆21Updated 3 weeks ago
Alternatives and similar repositories for cuda-python-starter-kit:
Users that are interested in cuda-python-starter-kit are comparing it to the libraries listed below
- A list of awesome resources and blogs on topics related to Unum☆37Updated 5 months ago
- Learning how to write "Less Slow" code in Python, from numerical micro-kernels to coroutines, ranges, and polymorphic state machines☆30Updated last week
- A FlashAttention implementation for JAX with support for efficient document mask computation and context parallelism.☆85Updated last month
- High-Performance SGEMM on CUDA devices☆87Updated 2 months ago
- ScalarLM - a unified training and inference stack☆31Updated last week
- LLM training in simple, raw C/CUDA☆92Updated 11 months ago
- ☆21Updated 3 weeks ago
- A repository to unravel the language of GPUs, making their kernel conversations easy to understand☆169Updated last week
- Write a fast kernel and run it on Discord. See how you compare against the best!☆35Updated this week
- ☆12Updated last year
- 👷 Build compute kernels☆24Updated this week
- ☆22Updated this week
- Because it's there.☆16Updated 6 months ago
- Documentation retrieval system to help LLMs navigate less-popular (yet often more powerful) Python libraries☆12Updated 10 months ago
- Binary vector search example using Unum's USearch engine and pre-computed Wikipedia embeddings from Co:here and MixedBread☆18Updated 11 months ago
- Small scale distributed training of sequential deep learning models, built on Numpy and MPI.☆127Updated last year
- ☆14Updated last month
- Lightweight Llama 3 8B Inference Engine in CUDA C☆47Updated last week
- Fast and vectorizable algorithms for searching in a vector of sorted floating point numbers☆133Updated 3 months ago
- Rust Implementation of micrograd☆51Updated 8 months ago
- Horizon chart for CPU/GPU/Neural Engine utilization monitoring on Apple M1/M2 and nVidia GPUs on Linux☆25Updated 3 weeks ago
- NLP with Rust for Python 🦀🐍☆61Updated 9 months ago
- Learn CUDA with PyTorch☆19Updated 2 months ago
- GPU prices aggregator for cloud providers☆34Updated this week
- I have no idea what I'm doing , but llm.c in rust☆12Updated 8 months ago
- A parallel framework for training deep neural networks☆57Updated 2 weeks ago
- Efficient BM25 with DuckDB 🦆☆44Updated 3 months ago
- A stand-alone implementation of several NumPy dtype extensions used in machine learning.☆255Updated this week
- Vector Database with support for late interaction and token level embeddings.☆53Updated 6 months ago
- Tiny Semantic Versioning (SemVer) library with LLMs and GitHub CI, that doesn't depend on 300K lines of JavaScript code and fits in a sin…☆20Updated 2 months ago