mlcommons / mobile_models
MLPerf™ Mobile models
☆24Updated last month
Related projects ⓘ
Alternatives and complementary repositories for mobile_models
- ☆18Updated 3 years ago
- FP64 equivalent GEMM via Int8 Tensor Cores using the Ozaki scheme☆46Updated 2 months ago
- ONNX Parser is a tool that automatically generates openvx inference code (CNN) from onnx binary model files.☆17Updated 5 years ago
- A simple profiler to count Nvidia PTX assembly instructions of OpenCL/SYCL/CUDA kernels for roofline model analysis.☆43Updated 10 months ago
- ☆11Updated 4 years ago
- LLVM-Canon aims to transform LLVM modules into a canonical form by reordering and renaming instructions while preserving the same semanti…☆12Updated 6 months ago
- Machine Intelligence Shader Autogen. AMDGPU ML shader code generator. (previously iGEMMgen)☆34Updated last month
- Whippletree, a novel approach to scheduling dynamic, irregular workloads on the GPU☆21Updated 8 years ago
- cuASR: CUDA Algebra for Semirings☆34Updated 2 years ago
- CUDA accelerated medical imaging algorithms☆13Updated 2 years ago
- Fast integer division with divisor not known at compile time. To be used primarily in CUDA kernels.☆70Updated 9 years ago
- AMD ROCm Performance Primitives (RPP) library is a comprehensive high-performance computer vision library for AMD processors with HIP/Ope…☆55Updated this week
- A tracing JIT compiler for PyTorch☆12Updated 2 years ago
- CUDA Template Functions☆18Updated 3 months ago
- benchmarking some transformer deployments☆26Updated last year
- A GPU-based LZSS compression algorithm, highly tuned for NVIDIA GPGPUs and for streaming data, leveraging the respective strengths of CPU…☆35Updated 8 years ago
- Sample programs for the LLVM PTX back-end☆34Updated 9 years ago
- Template for starting CUDA/C++ project using CMake with Github Action for CI☆29Updated last year
- A 8-/16-/32-/64-bit floating point number family☆16Updated 2 years ago
- ☆33Updated last year
- tokenizer and parser for circle projects☆11Updated 5 years ago
- A portable high-level API with CUDA or OpenCL back-end☆54Updated 7 years ago
- SYCL-ML is a C++ library, implementing classical machine learning algorithms using SYCL.☆64Updated 4 years ago
- ☆17Updated last month
- Benchmarks to capture important workloads.☆28Updated 5 months ago
- An Architecture-level Fault Injection Tool for GPU Application Resilience Evaluations☆16Updated 4 years ago
- ☆67Updated 2 years ago
- A GPU performance prediction toolkit for CUDA programs☆16Updated 5 years ago
- C++ to OpenCL C Source-to-source Translation☆13Updated 10 years ago