Code for testing the native float16 matrix multiplication performance on Tesla P100 and V100 GPU based on cublasHgemm
☆35Aug 20, 2019Updated 6 years ago
Alternatives and similar repositories for cublasHgemm-P100
Users that are interested in cublasHgemm-P100 are comparing it to the libraries listed below
Sorting:
- ☆26May 22, 2023Updated 2 years ago
- Tensorflow model export from Python to C++ and inference without using TF library☆17Mar 13, 2019Updated 7 years ago
- C++ CPU inference library for Tensorflow object detection models based on the lightweight Tensorflow C-API.☆15Jun 26, 2018Updated 7 years ago
- ☆10May 12, 2022Updated 3 years ago
- HCC Sample Applications☆13Jan 3, 2017Updated 9 years ago
- TensorRT Int8 Python version sample. TensorRT Int8 Python 实现例子。TensorRT Int8 Pythonの例です☆14Jan 28, 2019Updated 7 years ago
- [MLSys 2023] Pre-train and Search: Efficient Embedding Table Sharding with Pre-trained Neural Cost Models☆16May 5, 2023Updated 2 years ago
- Web上に公開されている小説をスクレイピングして青空文庫形式のテキストにする☆19Feb 9, 2017Updated 9 years ago
- A NetWork Generate Names, Based On Conditional RNN, Set Condition And Generate Different Names.☆12May 15, 2017Updated 8 years ago
- RDMA Optimization on MXNet☆14Nov 12, 2017Updated 8 years ago
- YOLOv3-training-prune☆58Mar 9, 2021Updated 5 years ago
- An extension of deeplab-v2 (in TF) allowing for smoothed dilated convolutions☆12Mar 27, 2019Updated 6 years ago
- Benchmark scripts for comparing tutorials in PyTorch and JAX☆14Aug 25, 2022Updated 3 years ago
- Caffe: a fast open framework for deep learning.☆14Jun 2, 2016Updated 9 years ago
- Luthier, a GPU binary instrumentation tool for AMD GPUs☆27Mar 13, 2026Updated last week
- code for benchmarking GPU performance based on cublasSgemm and cublasHgemm☆34May 20, 2022Updated 3 years ago
- Implementation of The One Hundred Layers Tiramisu for semantic segmentation in Keras☆10Oct 23, 2018Updated 7 years ago
- Simple script to convert a frozen tensorflow .pb file to TensorRT UFF format☆18Jul 12, 2019Updated 6 years ago
- [SIGGRAPH Asia 2025] The official implementation of the paper "DvD: Unleashing a Generative Paradigm for Document Dewarping via Coordinat…☆33Mar 10, 2026Updated last week
- MobileSAM のエンコーダー/デコーダーをONNXに変換し、推論するサンプル☆11Apr 11, 2024Updated last year
- Accelerating Database Operations on a GPU with CUDA☆18Oct 5, 2015Updated 10 years ago
- High performance Cross-platform Inference-engine, you could run Anakin on x86-cpu,arm, nv-gpu, amd-gpu,bitmain and cambricon devices.☆537Sep 23, 2022Updated 3 years ago
- Pure-header, no dependency C/C++ code to write Numpy files (.npy)☆12May 13, 2017Updated 8 years ago
- Artifact for 'Register Optimizations for Stencils on GPUs'☆10Sep 18, 2018Updated 7 years ago
- Tensorflow implemention of various GAN.☆11Mar 14, 2020Updated 6 years ago
- ☆11Apr 23, 2021Updated 4 years ago
- Sparse matrix-matrix multiplication on CPU+GPU systems.☆13Mar 17, 2014Updated 12 years ago
- ☆16Jan 16, 2023Updated 3 years ago
- This repository is for my YT video series about optimizing a Tensorflow deep learning model using TensorRT. We demonstrate optimizing LeN…☆300Jul 4, 2019Updated 6 years ago
- Video classification using convGRU☆13Feb 15, 2018Updated 8 years ago
- Tornado Web Server git repository for OpenShift with Python 3.3☆15Dec 13, 2015Updated 10 years ago
- Implementation of our CVPR2019 paper on Depth Completion: Dense Depth Posterior (DDP) from Single Image and Sparse Range☆17Mar 30, 2019Updated 6 years ago
- A memory profiler for NVIDIA GPUs to explore memory inefficiencies in GPU-accelerated applications.☆30Oct 13, 2024Updated last year
- simple port of hpl-2.0 to use NVIDIA GPU accelation with CUBLAS☆29May 13, 2013Updated 12 years ago
- This repository provides a sample to run yolov3 on int8 mode in tensorRT☆25Sep 6, 2019Updated 6 years ago
- Generate publication-quality figures using python☆23Jun 5, 2016Updated 9 years ago
- ☆18Oct 24, 2013Updated 12 years ago
- A dynamic version of std::bitset☆17Aug 25, 2013Updated 12 years ago
- ☆11Apr 10, 2015Updated 10 years ago