elinx / ugrad
A C++ implementation of the scalar-valued autograd engine micrograd
☆23Updated 4 years ago
Alternatives and similar repositories for ugrad:
Users that are interested in ugrad are comparing it to the libraries listed below
- A c/c++ implementation of micrograd: a tiny autograd engine with neural net on top.☆66Updated last year
- MLIR based Tiny Graph Compiler [dev-stage]☆16Updated 4 months ago
- A header only library implementing common mathematical functions using SIMD intrinsics☆102Updated last month
- A simple and fast minimalistic header-only library allowing to run async tasks and execute task graphs.☆52Updated 4 months ago
- LLM training in simple, raw C/CUDA☆92Updated 11 months ago
- Converting a deep neural network to integer-only inference in native C via uniform quantization and the fixed-point representation.☆23Updated 3 years ago
- GPT-2 in C☆67Updated 3 months ago
- Simple Byte pair Encoding mechanism used for tokenization process . written purely in C☆129Updated 4 months ago
- ☆44Updated 6 years ago
- An Open Convolutional Neural Network Framework in C++ From Scratch☆61Updated 4 years ago
- Neural network from scratch in CUDA/C++☆78Updated 2 months ago
- CUDA Matrix Multiplication Optimization☆177Updated 8 months ago
- A faithful clone of Karpathy's llama2.c (one file inference, zero dependency) but fully functional with LLaMA 3 8B base and instruct mode…☆123Updated 8 months ago
- An implementation of the transformer architecture onto an Nvidia CUDA kernel☆174Updated last year
- Scalar-valued Automatic Differentiation library in C☆49Updated last year
- Multi-Threaded FP32 Matrix Multiplication on x86 CPUs☆343Updated last month
- Recreating PyTorch from scratch (C/C++, CUDA, NCCL and Python, with multi-GPU support and automatic differentiation!)☆146Updated 9 months ago
- C implementation of the L-Mul f32/f16 multiplications from paper: https://arxiv.org/html/2410.00907☆27Updated 5 months ago
- GPT2 implementation in C++ using Ort☆26Updated 4 years ago
- High-Performance SGEMM on CUDA devices☆87Updated 2 months ago
- TinyFive is a lightweight RISC-V emulator and assembler written in Python with neural network examples☆56Updated last year
- A single header-only C++ library for automatic / algorithmic differentiation.☆13Updated 2 years ago
- pytorch from scratch in pure C/CUDA and python☆40Updated 5 months ago
- Neural Network framework using Backpropogation in C☆74Updated 3 years ago
- Learning about CUDA by writing PTX code.☆125Updated last year
- A C++ port of karpathy/micrograd, a tiny scalar-valued autograd engine and a neural net library☆13Updated last year
- asynchronous/distributed speculative evaluation for llama3☆39Updated 7 months ago
- CUDA 8-bit Tensor Core Matrix Multiplication based on m16n16k16 WMMA API☆28Updated last year
- Learn OpenCL step by step.☆134Updated 2 years ago
- A recurrent (LSTM) neural network in C☆92Updated 3 years ago