AnswerDotAI / gpu.cppLinks
A lightweight library for portable low-level GPU computation using WebGPU.
☆3,944Updated 4 months ago
Alternatives and similar repositories for gpu.cpp
Users that are interested in gpu.cpp are comparing it to the libraries listed below
Sorting:
- Implementation for MatMul-free LM.☆3,052Updated 2 months ago
- Run PyTorch LLMs locally on servers, desktop and mobile☆3,624Updated 5 months ago
- lightweight, standalone C++ inference engine for Google's Gemma models.☆6,728Updated this week
- CUDA Core Compute Libraries☆2,162Updated this week
- ☆1,281Updated last year
- Tile primitives for speedy kernels☆3,139Updated this week
- On-device AI across mobile, embedded and edge for PyTorch☆4,258Updated this week
- Playing around "Less Slow" coding practices in C++ 20, C, CUDA, PTX, & Assembly, from numerics & SIMD to coroutines, ranges, exception ha…☆1,899Updated last month
- nanobind: tiny and efficient C++/Python bindings☆3,347Updated this week
- General purpose GPU compute framework built on Vulkan to support 1000s of cross vendor graphics cards (AMD, Qualcomm, NVIDIA & friends). …☆2,449Updated last week
- An efficient C++20 GPU numerical computing library with Python-like syntax☆1,402Updated last week
- Up to 200x Faster Dot Products & Similarity Metrics — for Python, Rust, C, JS, and Swift, supporting f64, f32, f16 real & complex, i8, an…☆1,652Updated this week
- A Python framework for accelerated simulation, data generation and spatial computing.☆6,191Updated last week
- Distributed LLM and StableDiffusion inference for mobile, desktop and server.☆2,901Updated last year
- CUDA Python: Performance meets Productivity☆3,161Updated this week
- Inference Llama 2 in one file of pure C☆19,162Updated last year
- CoreNet: A library for training deep neural networks☆7,016Updated 4 months ago
- ☆1,074Updated 8 months ago
- UNet diffusion model in pure CUDA☆661Updated last year
- A modern model graph visualizer and debugger☆1,384Updated this week
- PyTorch compiler that accelerates training and inference. Get built-in optimizations for performance, memory, parallelism, and easily wri…☆1,440Updated last week
- Tensor library for machine learning☆13,923Updated this week
- Simple and efficient pytorch-native transformer text generation in <1000 LOC of python.☆6,183Updated 5 months ago
- Meta Lingua: a lean, efficient, and easy-to-hack codebase to research LLMs.☆4,754Updated 6 months ago
- Deep learning at the speed of light.☆2,767Updated this week
- PyTorch native quantization and sparsity for training and inference☆2,668Updated this week
- A machine learning compiler for GPUs, CPUs, and ML accelerators☆3,973Updated this week
- If tinygrad wasn't small enough for you...☆774Updated last year
- A massively parallel, high-level programming language☆19,157Updated 8 months ago
- A JAX research toolkit for building, editing, and visualizing neural networks.☆1,863Updated 7 months ago