lcy-seso / DLFrameworkTest
My tests and experiments with some popular dl frameworks.
☆11Updated 3 months ago
Alternatives and similar repositories for DLFrameworkTest:
Users that are interested in DLFrameworkTest are comparing it to the libraries listed below
- ☆19Updated 3 months ago
- ThrillerFlow is a Dataflow Analysis and Codegen Framework written in Rust.☆14Updated last month
- PyTorch compilation tutorial covering TorchScript, torch.fx, and Slapo☆19Updated last year
- TiledKernel is a code generation library based on macro kernels and memory hierarchy graph data structure.☆19Updated 8 months ago
- Benchmark tests supporting the TiledCUDA library.☆12Updated 2 months ago
- PTX-EMU is a simple emulator for CUDA program.☆26Updated last year
- An external memory allocator example for PyTorch.☆14Updated 3 years ago
- Artifacts of EVT ASPLOS'24☆23Updated 10 months ago
- ☆39Updated this week
- An Attention Superoptimizer☆20Updated 8 months ago
- Tacker: Tensor-CUDA Core Kernel Fusion for Improving the GPU Utilization while Ensuring QoS☆18Updated 3 years ago
- study of Ampere' Sparse Matmul☆16Updated 4 years ago
- modified cutlass☆14Updated 4 years ago
- A standalone GEMM kernel for fp16 activation and quantized weight, extracted from FasterTransformer☆87Updated 10 months ago
- ☆11Updated 3 years ago
- A source-to-source compiler for optimizing CUDA dynamic parallelism by aggregating launches☆14Updated 5 years ago
- ☆8Updated last year
- GPTQ inference TVM kernel☆38Updated 8 months ago
- ☆14Updated 2 years ago
- Noisy language compiler☆17Updated 5 months ago
- ☆25Updated 9 months ago
- ☆17Updated 4 years ago
- An extention of TVMScript to write simple and high performance GPU kernels with tensorcore.☆51Updated 5 months ago
- Benchmark PyTorch Custom Operators☆13Updated last year
- ☆24Updated this week
- Optimize tensor program fast with Felix, a gradient descent autotuner.☆24Updated 8 months ago
- Yet another Polyhedra Compiler for DeepLearning☆19Updated last year
- ☆38Updated 7 months ago
- ☆22Updated 3 weeks ago