lcy-seso / DLFrameworkTest
My tests and experiments with some popular dl frameworks.
☆12Updated this week
Alternatives and similar repositories for DLFrameworkTest:
Users that are interested in DLFrameworkTest are comparing it to the libraries listed below
- PyTorch compilation tutorial covering TorchScript, torch.fx, and Slapo☆19Updated last year
- Benchmark tests supporting the TiledCUDA library.☆15Updated 3 months ago
- ☆23Updated 2 months ago
- TileFusion is a highly efficient kernel template library designed to elevate the level of abstraction in CUDA C for processing tiles.☆56Updated this week
- An external memory allocator example for PyTorch.☆14Updated 3 years ago
- Tacker: Tensor-CUDA Core Kernel Fusion for Improving the GPU Utilization while Ensuring QoS☆19Updated last week
- ☆11Updated 3 years ago
- TiledKernel is a code generation library based on macro kernels and memory hierarchy graph data structure.☆19Updated 9 months ago
- An Attention Superoptimizer☆21Updated last month
- modified cutlass☆14Updated 4 years ago
- ☆19Updated 4 months ago
- ThrillerFlow is a Dataflow Analysis and Codegen Framework written in Rust.☆14Updated 2 months ago
- My study note for mlsys☆14Updated 3 months ago
- Artifacts for SOSP'19 paper Optimizing Deep Learning Computation with Automatic Generation of Graph Substitutions☆21Updated 2 years ago
- Benchmark PyTorch Custom Operators☆13Updated last year
- Optimize tensor program fast with Felix, a gradient descent autotuner.☆24Updated 9 months ago
- An experimental ahead of time compiler for Relay.☆50Updated 4 years ago
- GPTQ inference TVM kernel☆38Updated 9 months ago
- ☆36Updated last month
- Example for running IREE in a bare-metal Arm environment.☆29Updated last month
- Yet another Polyhedra Compiler for DeepLearning☆19Updated last year
- Static analysis framework for analyzing programs written in TVM's Relay IR.☆27Updated 5 years ago
- PTX-EMU is a simple emulator for CUDA program.☆28Updated last year
- A TVM-like CUDA/C code generator.☆9Updated 3 years ago
- Simple PyTorch profiler that combines DeepSpeed Flops Profiler and TorchInfo☆11Updated 2 years ago
- An MLIR frontend for tensor expressions☆24Updated 4 years ago
- Artifacts of EVT ASPLOS'24☆23Updated 11 months ago
- ☆26Updated 10 months ago
- Chameleon: Adaptive Code Optimization for Expedited Deep Neural Network Compilation☆27Updated 5 years ago