baidu-research / catamount
Catamount is a compute graph analysis tool to load, construct, and modify deep learning models and to symbolically analyze their compute requirements
☆13Updated 3 years ago
Alternatives and similar repositories for catamount:
Users that are interested in catamount are comparing it to the libraries listed below
- Generating Families of Practical Fast Matrix Multiplication Algorithms☆12Updated 7 years ago
- An ONNX backend using PlaidML☆28Updated 6 years ago
- Input-aware cuBLAS/clBLAS implementation for better performance☆17Updated 2 years ago
- Python bindings for libNVVM☆36Updated 10 years ago
- nGraph™ Backend for ONNX☆42Updated 2 years ago
- ☆14Updated 5 years ago
- GPU Automatically Tuned Linear Algebra Software☆28Updated 9 years ago
- Scout -- Domain Specific Language & Toolchain☆15Updated 8 years ago
- ☆9Updated 5 years ago
- A CUDA implementation of the PageRank Pipeline Benchmark☆32Updated 7 years ago
- DSL for stencils and image processing☆13Updated 8 years ago
- Fork of magma to include more BLAS☆28Updated 8 years ago
- Scientific library for high-precision computations and research☆50Updated 7 years ago
- TTC: A high-performance Compiler for Tensor Transpositions☆20Updated 7 years ago
- ☆15Updated 6 years ago
- Nitro Autotuning Framework☆9Updated 8 years ago
- The Hybrid Task Graph Scheduler API☆40Updated 3 years ago
- Based on SciPy's normalized git stats, adapted for Deep Learning frameworks☆16Updated 7 years ago
- An Architecture-level Fault Injection Tool for GPU Application Resilience Evaluations☆16Updated 4 years ago
- Project ARES represents a joint effort between LANL and ORNL to introduce a common compiler representation and tool-chain for HPC applica…☆10Updated 8 years ago
- ☆14Updated 8 years ago
- ☆10Updated 2 years ago
- Cairo lua bindings with extensions for torch☆15Updated 8 years ago
- npcomp - An aspirational MLIR based numpy compiler☆51Updated 4 years ago
- Code examples for CUDA and OpenACC☆34Updated 4 months ago
- A CUDA implementation of the Tsetlin Machine based on bitwise operators☆26Updated 5 years ago
- A portable high-level API with CUDA or OpenCL back-end☆54Updated 7 years ago
- Cephes Mathematical Functions library wrapped for Torch☆47Updated 8 years ago
- Experimental Linear Algebra Performance Studies☆12Updated 7 years ago
- Sublinear memory optimization for deep learning, reduce GPU memory cost to train deeper nets☆29Updated 8 years ago