kfish / micrograd-cpp-2023Links

A C++ port of karpathy/micrograd, a tiny scalar-valued autograd engine and a neural net library

☆14

Alternatives and similar repositories for micrograd-cpp-2023

Users that are interested in micrograd-cpp-2023 are comparing it to the libraries listed below

Sorting:

rbga / CUDA-Merge-and-Bitonic-Sort
Efficient implementations of Merge Sort and Bitonic Sort algorithms using CUDA for GPU parallel processing, resulting in accelerated sort…
☆15Updated last year
AyakaGEMM / Hands-on-MLIR
☆17Updated last year
KhronosGroup / SYCL_Reference
SYCL Reference Manual
☆28Updated last year
syoyo / safetensors-cpp
Header-only safetensors loader and saver in C++
☆62Updated 3 weeks ago
intel / graph-compiler
MLIR-based toolkit targeting intel heterogeneous hardware
☆44Updated 3 months ago
wmmae / wmma_extension
An extension library of WMMA API (Tensor Core API)
☆97Updated 10 months ago
ProjectPhysX / PTXprofiler
A simple profiler to count Nvidia PTX assembly instructions of OpenCL/SYCL/CUDA kernels for roofline model analysis.
☆52Updated 2 months ago
ROCm / hipTensor
AMD’s C++ library for accelerating tensor primitives
☆42Updated this week
ROCm / half
☆19Updated 2 weeks ago
Bruce-Lee-LY / cuda_hgemv
Several optimization methods of half-precision general matrix vector multiplication (HGEMV) using CUDA core.
☆62Updated 9 months ago
satishphd / Teaching-Intel-Intrinsics-for-SIMD-Parallelism
Teaching Vectorization and SIMD using Intel Intrinsics in a Computer Organization and Architecture class
☆15Updated 3 months ago
CoffeeBeforeArch / mmul
Serial and parallel implementations of matrix multiplication
☆41Updated 4 years ago
Chtholly-Boss / swizzle
A practical way of learning Swizzle
☆20Updated 4 months ago
XiaoSong9905 / HPC-Notes
Personal Notes for Learning HPC & Parallel Computation [Active Adding New Content]
☆67Updated 2 years ago
libxsmm / tpp-mlir
TPP experimentation on MLIR for linear algebra
☆131Updated last week
makslevental / mmlir
A minimal (really) out-of-tree MLIR example
☆44Updated 3 weeks ago
microsoft / TileFusion
TileFusion is an experimental C++ macro kernel template library that elevates the abstraction level in CUDA C for tile processing.
☆88Updated this week
leimao / CUTLASS-Examples
CUTLASS and CuTe Examples
☆54Updated 5 months ago
CisMine / Guide-NVIDIA-Tools
NVIDIA tools guide
☆133Updated 5 months ago
cupbop / CuPBoP
A framework that support executing unmodified CUDA source code on non-NVIDIA devices.
☆127Updated 5 months ago
nvixnu / pmpp__programming_massively_parallel_processors
Examples and exercises from the book Programming Massively Parallel Processors - A Hands-on Approach. David B. Kirk and Wen-mei W. Hwu (T…
☆67Updated 4 years ago
tlc-pack / libflash_attn
Standalone Flash Attention v2 kernel without libtorch dependency
☆110Updated 8 months ago
carlushuang / gcnasm
amdgpu example code in hip/asm
☆32Updated this week
gevtushenko / llm.c
LLM training in simple, raw C/CUDA
☆99Updated last year
MPACT-ORG / mpact-compiler
Retargetable ML compilers for the twenty-first century!
☆13Updated last month
ROCm / rocWMMA
rocWMMA
☆114Updated last week
CUDACommunity / CUDACommunityMeetup2021
☆23Updated 3 years ago
l1nkr / DL-Compiler-Navigation
Machine Learning Compiler Road Map
☆43Updated last year
GaoYusong / llm.cpp
A C++ port of karpathy/llm.c features a tiny torch library while maintaining overall simplicity.
☆33Updated 10 months ago
gpuocelot / gpuocelot
GPUOcelot: A dynamic compilation framework for PTX
☆192Updated 3 months ago