Code for testing the native float16 matrix multiplication performance on Tesla P100 and V100 GPU based on cublasHgemm
☆35Aug 20, 2019Updated 6 years ago
Alternatives and similar repositories for cublasHgemm-P100
Users that are interested in cublasHgemm-P100 are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- C++ CPU inference library for Tensorflow object detection models based on the lightweight Tensorflow C-API.☆15Jun 26, 2018Updated 7 years ago
- ☆10May 12, 2022Updated 4 years ago
- TensorRT Int8 Python version sample. TensorRT Int8 Python 实现例子。TensorRT Int8 Pythonの例です☆14Jan 28, 2019Updated 7 years ago
- Pure tensorflow Implement of YOLOv3 with support to train your own dataset☆18Jan 12, 2019Updated 7 years ago
- Overclocking the Jetson Nano CPU and GPU☆22Oct 12, 2020Updated 5 years ago
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- A NetWork Generate Names, Based On Conditional RNN, Set Condition And Generate Different Names.☆12May 15, 2017Updated 9 years ago
- A copy of the DirectX Headers from MinGW-64.☆14Sep 7, 2023Updated 2 years ago
- ☆10Aug 18, 2016Updated 9 years ago
- YOLOv3-training-prune☆58Mar 9, 2021Updated 5 years ago
- PlayStation1 MDEC compression tools☆11Dec 31, 2020Updated 5 years ago
- An extension of deeplab-v2 (in TF) allowing for smoothed dilated convolutions☆12Mar 27, 2019Updated 7 years ago
- Caffe: a fast open framework for deep learning.☆14Jun 2, 2016Updated 10 years ago
- Implementation of All-Frequency Shadows Using Non-linear Wavelet Lighting Approximation by Ren Ng et al.☆11Jul 14, 2019Updated 6 years ago
- ☆27Nov 6, 2024Updated last year
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- implementation of relationNet naive version☆12Dec 4, 2017Updated 8 years ago
- RaNNC is an automatic parallelization middleware used to train very large-scale neural networks.☆57Oct 15, 2022Updated 3 years ago
- Simple script to convert a frozen tensorflow .pb file to TensorRT UFF format☆18Jul 12, 2019Updated 6 years ago
- Dialogue Graph Modeling for Conversational Machine Reading (ACL 2021, Findings)☆18Nov 29, 2022Updated 3 years ago
- Rust implementation of k-d tree to efficiently perform color quantization to predefined sets☆13Feb 14, 2018Updated 8 years ago
- MobileSAM のエンコーダー/デコーダーをONNXに変換し、推論するサンプル☆12Apr 11, 2024Updated 2 years ago
- Investigations into simplified holdem poker☆12Oct 17, 2012Updated 13 years ago
- High performance Cross-platform Inference-engine, you could run Anakin on x86-cpu,arm, nv-gpu, amd-gpu,bitmain and cambricon devices.☆536Sep 23, 2022Updated 3 years ago
- Polyglot CUDA integration for the GraalVM☆18Apr 6, 2025Updated last year
- AI Agents on DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- Tensorflow implemention of various GAN.☆11Mar 14, 2020Updated 6 years ago
- ☆10Apr 23, 2021Updated 5 years ago
- Source for our HPG Paper "CPU-Style SIMD Ray Traversal on GPUs"☆15Aug 31, 2018Updated 7 years ago
- Sparse matrix-matrix multiplication on CPU+GPU systems.☆13Mar 17, 2014Updated 12 years ago
- ☆16Jan 16, 2023Updated 3 years ago
- Demonstrates order independent transparency on Vulkan using depth peeling.☆20Oct 24, 2017Updated 8 years ago
- Spherical Harmonics library inspired by D3DX☆16Jan 23, 2012Updated 14 years ago
- Starting OpenCL on Visual Studio and It's Configuration☆12Nov 28, 2021Updated 4 years ago
- ☆13Mar 2, 2021Updated 5 years ago
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- This repository is for my YT video series about optimizing a Tensorflow deep learning model using TensorRT. We demonstrate optimizing LeN…☆300Jul 4, 2019Updated 6 years ago
- Video classification using convGRU☆13Feb 15, 2018Updated 8 years ago
- ngAP's artifact for ASPLOS'24☆25Jul 29, 2025Updated 10 months ago
- Code for the paper "Understanding the Role of Momentum in Stochastic Gradient Methods"☆14Oct 27, 2019Updated 6 years ago
- simple port of hpl-2.0 to use NVIDIA GPU accelation with CUBLAS☆29May 13, 2013Updated 13 years ago
- Generate publication-quality figures using python☆23Jun 5, 2016Updated 10 years ago
- ☆18Oct 24, 2013Updated 12 years ago