mengwanguc / gpemuLinks
GPEmu, a GPU emulator for faster and cheaper prototyping and evaluation of deep learning system research
☆27Updated 9 months ago
Alternatives and similar repositories for gpemu
Users that are interested in gpemu are comparing it to the libraries listed below
Sorting:
- Tensor library & inference framework for machine learning☆109Updated last week
- Pytorch script hot swap: Change code without unloading your LLM from VRAM☆126Updated 4 months ago
- The Engineer's Guide to Deep-Learning☆37Updated 7 months ago
- ☆196Updated 4 months ago
- Lightweight Llama 3 8B Inference Engine in CUDA C☆49Updated 5 months ago
- tiny code to access tenstorrent blackhole☆60Updated 3 months ago
- Standalone commandline CLI tool for compiling Triton kernels☆18Updated 11 months ago
- Heirarchical Navigable Small Worlds☆101Updated 3 weeks ago
- Repository for the QUIK project, enabling the use of 4bit kernels for generative inference - EMNLP 2024☆181Updated last year
- ☆410Updated last week
- xet client tech, used in huggingface_hub☆190Updated this week
- Tilus is a tile-level kernel programming language with explicit control over shared memory and registers.☆325Updated this week
- Inference of Mamba models in pure C☆191Updated last year
- Hashed Lookup Table based Matrix Multiplication (halutmatmul) - Stella Nera accelerator☆211Updated last year
- ☆249Updated last year
- Algebraic enhancements for GEMM & AI accelerators☆279Updated 6 months ago
- Make triton easier☆47Updated last year
- Pivotal Token Search☆123Updated last month
- A playground to make it easy to try crazy things☆33Updated 2 months ago
- A tiny autograd engine with a Jax-like API☆74Updated last month
- time to learn mlx☆40Updated 3 months ago
- Samples of good AI generated CUDA kernels☆89Updated 3 months ago
- A probabilistic approximate DNF counter☆37Updated this week
- Write a fast kernel and run it on Discord. See how you compare against the best!☆54Updated this week
- A library for incremental loading of large PyTorch checkpoints☆56Updated 2 years ago
- PCCL (Prime Collective Communications Library) implements fault tolerant collective communications over IP☆111Updated this week
- Gradual typing for tensor shapes in Rust☆75Updated 2 months ago
- DiscoGrad - automatically differentiate across conditional branches in C++ programs☆204Updated 11 months ago
- Prepare for DeekSeek R1 inference: Benchmark CPU, DRAM, SSD, iGPU, GPU, ... with efficient code.☆73Updated 7 months ago
- a curated list of data for reasoning ai☆137Updated last year