mengwanguc / gpemuLinks
GPEmu, a GPU emulator for faster and cheaper prototyping and evaluation of deep learning system research
☆30Updated 10 months ago
Alternatives and similar repositories for gpemu
Users that are interested in gpemu are comparing it to the libraries listed below
Sorting:
- Tensor library & inference framework for machine learning☆112Updated 2 weeks ago
- Pytorch script hot swap: Change code without unloading your LLM from VRAM☆124Updated 6 months ago
- Heirarchical Navigable Small Worlds☆101Updated 2 months ago
- Repository for the QUIK project, enabling the use of 4bit kernels for generative inference - EMNLP 2024☆183Updated last year
- ☆196Updated 5 months ago
- Tilus is a tile-level kernel programming language with explicit control over shared memory and registers.☆381Updated this week
- ☆443Updated last month
- Hand-Rolled GPU communications library☆41Updated this week
- The Engineer's Guide to Deep-Learning☆36Updated 9 months ago
- Algebraic enhancements for GEMM & AI accelerators☆279Updated 7 months ago
- tiny code to access tenstorrent blackhole☆59Updated 4 months ago
- ☆49Updated 5 years ago
- ☆248Updated last year
- Samples of good AI generated CUDA kernels☆91Updated 4 months ago
- 🏙 Interactive performance profiling and debugging tool for PyTorch neural networks.☆64Updated 9 months ago
- Standalone commandline CLI tool for compiling Triton kernels☆18Updated last year
- Hashed Lookup Table based Matrix Multiplication (halutmatmul) - Stella Nera accelerator☆214Updated last year
- Lightweight Llama 3 8B Inference Engine in CUDA C☆48Updated 7 months ago
- throwaway GPT inference☆140Updated last year
- Gradual typing for tensor shapes in Rust☆74Updated 3 months ago
- ☆16Updated 3 months ago
- Second attempt at ML lib, cleaner, better tests☆61Updated this week
- The Finite Field Assembly Programming Language☆36Updated 5 months ago
- This project aims to enable language model inference on FPGAs, supporting AI applications in edge devices and environments with limited r…☆167Updated last year
- fast combinations calculation in jax☆36Updated last year
- First token cutoff sampling inference example☆30Updated last year
- Apache Arrow-compatible space-efficient "tape" class in pure Rust to be used with StringZilla for GPU, NUMA, and disk transfers of variab…☆27Updated last week
- Make triton easier☆48Updated last year
- SWE-Bench Pro: Can AI Agents Solve Long-Horizon Software Engineering Tasks?☆196Updated this week
- High-Performance SGEMM on CUDA devices☆107Updated 9 months ago