yanghaku / tvm-rt-wasmLinks
A High performance and tiny TVM graph executor library written in C which can compile to WebAssembly and use CUDA/WebGPU as the accelerator.
☆12Updated 2 years ago
Alternatives and similar repositories for tvm-rt-wasm
Users that are interested in tvm-rt-wasm are comparing it to the libraries listed below
Sorting:
- ☆170Updated this week
- a simple general program language☆98Updated 3 months ago
- ONNX Serving is a project written with C++ to serve onnx-mlir compiled models with GRPC and other protocols.Benefiting from C++ implement…☆25Updated 2 months ago
- This is a demo how to write a high performance convolution run on apple silicon☆57Updated 3 years ago
- ☆17Updated last year
- An experimental CPU backend for Triton (https//github.com/openai/triton)☆47Updated 3 months ago
- An MLIR-based toy DL compiler for TVM Relay.☆61Updated 3 years ago
- PTX-EMU is a simple emulator for CUDA program.☆38Updated 7 months ago
- ☆26Updated 9 months ago
- Standalone Flash Attention v2 kernel without libtorch dependency☆112Updated last year
- Experiments and prototypes associated with IREE or MLIR☆56Updated last year
- A memory profiler for NVIDIA GPUs to explore memory inefficiencies in GPU-accelerated applications.☆27Updated last year
- CUDA SGEMM optimization note☆15Updated 2 years ago
- ☆26Updated 11 months ago
- Triton to TVM transpiler.☆22Updated last year
- GPTQ inference TVM kernel☆40Updated last year
- Open deep learning compiler stack for cpu, gpu and specialized accelerators☆35Updated 3 years ago
- Play with MLIR right in your browser☆138Updated 2 years ago
- play gemm with tvm☆92Updated 2 years ago
- Fast and memory-efficient exact attention☆104Updated this week
- A model compilation solution for various hardware☆457Updated 3 months ago
- AI applications and tools☆30Updated last month
- A prefill & decode disaggregated LLM serving framework with shared GPU memory and fine-grained compute isolation.☆118Updated 6 months ago
- ☆19Updated last year
- FlagTree is a unified compiler supporting multiple AI chip backends for custom Deep Learning operations, which is forked from triton-lang…☆145Updated this week
- ☆125Updated last year
- ⚡️Write HGEMM from scratch using Tensor Cores with WMMA, MMA and CuTe API, Achieve Peak⚡️ Performance.☆135Updated 7 months ago
- PTX on XPUs☆110Updated last month
- My study note for mlsys☆16Updated last year
- A GPU-driven system framework for scalable AI applications☆122Updated 10 months ago