ashvardanian / cuda-python-starter-kit
Parallel Computing starter project to build GPU & CPU kernels in CUDA & C++ and call them from Python without a single line of CMake using PyBind11
☆18Updated 4 months ago
Alternatives and similar repositories for cuda-python-starter-kit:
Users that are interested in cuda-python-starter-kit are comparing it to the libraries listed below
- Learning how to write "Less Slow" code in Python, from numerical micro-kernels to coroutines, ranges, and polymorphic state machines☆16Updated this week
- Binary vector search example using Unum's USearch engine and pre-computed Wikipedia embeddings from Co:here and MixedBread☆18Updated 9 months ago
- Tiny Semantic Versioning (SemVer) library with LLMs and GitHub CI, that doesn't depend on 300K lines of JavaScript code and fits in a sin…☆18Updated this week
- Generate Glue Code in seconds to simplify your Nvidia Triton Inference Server Deployments☆16Updated 6 months ago
- 🛠 Self-hosted, fast, and consistent remote configuration for apps.☆14Updated 2 years ago
- Triton backend for managing the model state tensors automatically in sequence batcher☆14Updated 11 months ago
- GPU Environment Management for Visual Studio Code☆37Updated last year
- Lightweight Llama 3 8B Inference Engine in CUDA C☆42Updated last week
- Efficiently computing & storing token n-grams from large corpora☆17Updated 3 months ago
- Better bindings for Python☆17Updated last year
- LLM training in simple, raw C/CUDA☆91Updated 8 months ago
- Example ML projects that use the Determined library.☆25Updated 4 months ago
- Efficient BM25 with DuckDB 🦆☆36Updated 3 weeks ago
- NLP with Rust for Python 🦀🐍☆60Updated 7 months ago
- build your own vector database -- the littlest hnsw☆55Updated last week
- ☆21Updated this week
- Loop Nest - Linear algebra compiler and code generator.☆22Updated 2 years ago
- This repository is designed for deploying and managing server processes that handle embeddings using the Infinity Embedding model or Larg…☆17Updated 2 months ago
- Lightweight daemon for monitoring CUDA runtime API calls with eBPF uprobes☆31Updated this week
- ☆31Updated last week
- Inference Llama 2 in C++☆45Updated 8 months ago
- SGEMM that beats cuBLAS☆36Updated this week
- Vector Database with support for late interaction and token level embeddings.☆51Updated 3 months ago
- Cortex-compatible model server for Python and TensorFlow☆17Updated 2 years ago
- Rust Implementation of micrograd☆51Updated 6 months ago
- Learning Unum's efficient data-processing tools one cool project at a time☆11Updated last year
- Trace LLM calls (and others) and visualize them in WandB, as interactive SVG or using a streaming local webapp☆14Updated last year
- Scripts to prep PC for development use after OS installs☆37Updated this week
- Evalica, your favourite evaluation toolkit☆24Updated 2 weeks ago
- ☆12Updated this week