vaibhawvipul / performance-engineering
☆22Updated last year
Related projects: ⓘ
- ⛰️ RockyML - A High-Performance Scientific Computing Framework for Non-smooth Machine Learning Problems☆16Updated last year
- Custom kernels in Triton language for accelerating LLMs☆14Updated 5 months ago
- LLM training in simple, raw C/CUDA☆79Updated 4 months ago
- a Minimal, clean code for the Byte Pair Encoding (BPE) algorithm commonly used in LLM tokenization in pure C.☆21Updated 2 months ago
- ML/DL Math and Method notes☆56Updated 9 months ago
- Projects completed under LinuxWorld Informatics Ltd. - MLOps Training.☆12Updated 4 years ago
- Article about deploying machine learning models using grpc, pytorch and asyncio☆24Updated last year
- This library empowers users to seamlessly port pretrained models and checkpoints on the HuggingFace (HF) hub (developed using HF transfor…☆37Updated this week
- ☆21Updated last week
- A Gentle Principled Introduction to Deep Reinforcement Learning☆19Updated last month
- Inference Llama 2 in C++☆47Updated 4 months ago
- A tracing JIT compiler for PyTorch☆12Updated 2 years ago
- Fast and vectorizable algorithms for searching in a vector of sorted floating point numbers☆113Updated last year
- Make triton easier☆39Updated 3 months ago
- Notes on "Programming Massively Parallel Processors" by Hwu, Kirk, and Hajj (4th ed.)☆48Updated last month
- Rust Implementation of micrograd☆51Updated 2 months ago
- Guides and examples to help achieve optimal performance on a NVIDIA Grace CPU☆11Updated last month
- Learning about CUDA by writing PTX code.☆28Updated 6 months ago
- ☆22Updated 8 months ago
- MLPerf™ logging library☆30Updated last week
- ☆22Updated 2 years ago
- NVIDIA tools guide☆60Updated last month
- How to build an LLVM backend, published by Packt☆14Updated last week
- HNSW tutorial☆55Updated 7 months ago
- SKIP for AI☆20Updated 4 years ago
- Personal notes on CUDA programming☆48Updated last year
- [CF ’20] Verified Instruction-Level Energy Consumption Measurement for NVIDIA GPUs☆14Updated 3 years ago
- Code for a workshop hosted at the MLOps World Summit '22☆15Updated 2 years ago
- ☆22Updated last week
- Computing the greatest common divisor with transformers, source code for the paper https//arxiv.org/abs/2308.15594☆11Updated 5 months ago