elinx / ugradLinks

A C++ implementation of the scalar-valued autograd engine micrograd

☆23

Alternatives and similar repositories for ugrad

Users that are interested in ugrad are comparing it to the libraries listed below

Sorting:

GaoYusong / llm.cpp
A C++ port of karpathy/llm.c features a tiny torch library while maintaining overall simplicity.
☆34Updated 10 months ago
VimalWill / TinyCompiler
MLIR based Tiny Graph Compiler [dev-stage]
☆18Updated 7 months ago
ubermenchh / Flash
my little linear algebra library
☆46Updated 11 months ago
ysh329 / OpenCL-101
Learn OpenCL step by step.
☆136Updated 2 years ago
lucasdelimanogueira / PyNorch
Recreating PyTorch from scratch (C/C++, CUDA, NCCL and Python, with multi-GPU support and automatic differentiation!)
☆150Updated last year
gevtushenko / llm.c
LLM training in simple, raw C/CUDA
☆99Updated last year
CoffeeBeforeArch / mmul
Serial and parallel implementations of matrix multiplication
☆41Updated 4 years ago
AyakaGEMM / Hands-on-MLIR
☆17Updated last year
salykova / sgemm.c
Multi-Threaded FP32 Matrix Multiplication on x86 CPUs
☆352Updated 2 months ago
h3ct0rjs / HighPerformanceComputing
Class of High Performance Computing taken at U.T.P 2017
☆65Updated 7 years ago
Ricardicus / recurrent-neural-net
A recurrent (LSTM) neural network in C
☆94Updated 3 years ago
unixpickle / learn-ptx
Learning about CUDA by writing PTX code.
☆133Updated last year
gau-nernst / quantized-training
Explore training for quantized models
☆18Updated this week
BobMcDear / neural-network-cuda
Neural network from scratch in CUDA/C++
☆80Updated 5 months ago
iml130 / nncg
NNCG: A Neural Network Code Generator
☆35Updated 10 months ago
ShaYeBuHui01 / flash_attention_inference
Performance of the C++ interface of flash attention and flash attention v2 in large language model (LLM) inference scenarios.
☆16Updated last year
CisMine / Guide-NVIDIA-Tools
NVIDIA tools guide
☆137Updated 5 months ago
jameswdelancey / llama3.c
A faithful clone of Karpathy's llama2.c (one file inference, zero dependency) but fully functional with LLaMA 3 8B base and instruct mode…
☆128Updated 11 months ago
leimao / CUDA-GEMM-Optimization
CUDA Matrix Multiplication Optimization
☆196Updated 11 months ago
leimao / CUTLASS-Examples
CUTLASS and CuTe Examples
☆57Updated 5 months ago
wzh99 / relay-mlir
An MLIR-based toy DL compiler for TVM Relay.
☆58Updated 2 years ago
Bruce-Lee-LY / cuda_hgemv
Several optimization methods of half-precision general matrix vector multiplication (HGEMV) using CUDA core.
☆63Updated 9 months ago
seb-v / fp32_sgemm_amd
Super fast FP32 matrix multiplication on RDNA3
☆64Updated 2 months ago
salykova / sgemm.cu
High-Performance SGEMM on CUDA devices
☆96Updated 5 months ago
benja263 / Integer-Only-Inference-for-Deep-Learning-in-Native-C
Converting a deep neural network to integer-only inference in native C via uniform quantization and the fixed-point representation.
☆24Updated 3 years ago
ssloy / tinyoptimizer
Can I make an *optimizing* compiler under 1k lines of code?
☆60Updated 4 months ago
csukuangfj / OpenCNN
An Open Convolutional Neural Network Framework in C++ From Scratch
☆65Updated 4 years ago
dpuyda / scheduling
A simple and fast minimalistic header-only library allowing to run async tasks and execute task graphs.
☆53Updated 6 months ago
astojanov / Clover
Clover: Quantized 4-bit Linear Algebra Library
☆114Updated 7 years ago
apoorvnandan / lilgrad
pytorch from scratch in pure C/CUDA and python
☆40Updated 8 months ago