Code for testing the native float16 matrix multiplication performance on Tesla P100 and V100 GPU based on cublasHgemm
☆35Aug 20, 2019Updated 6 years ago
Alternatives and similar repositories for cublasHgemm-P100
Users that are interested in cublasHgemm-P100 are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- OpenVX API and extension specification documents☆19Mar 18, 2026Updated last month
- C++ CPU inference library for Tensorflow object detection models based on the lightweight Tensorflow C-API.☆15Jun 26, 2018Updated 7 years ago
- Sparse Boolean linear algebra for Nvidia Cuda, OpenCL and CPU computations☆16Aug 19, 2022Updated 3 years ago
- outline and links for PLDI 2022 tutorial☆17Jun 13, 2022Updated 3 years ago
- HCC Sample Applications☆13Jan 3, 2017Updated 9 years ago
- Serverless GPU API endpoints on Runpod - Get Bonus Credits • AdSkip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
- TensorRT Int8 Python version sample. TensorRT Int8 Python 实现例子。TensorRT Int8 Pythonの例です☆14Jan 28, 2019Updated 7 years ago
- 一键部署SonarQube静态代码分析平台,并将结果持久化在宿主机。☆12Jul 5, 2018Updated 7 years ago
- An example of how to communicate to a service class threw a Binder.☆10Aug 12, 2015Updated 10 years ago
- Pure tensorflow Implement of YOLOv3 with support to train your own dataset☆18Jan 12, 2019Updated 7 years ago
- ☆14Jan 12, 2022Updated 4 years ago
- Samples and documentation for deploying EDA computing environments in AWS☆31Feb 6, 2026Updated 2 months ago
- Benchmark scripts for comparing tutorials in PyTorch and JAX☆14Aug 25, 2022Updated 3 years ago
- Provides a vendored libjxl.☆16Oct 13, 2022Updated 3 years ago
- 以【电商购物支付】作为当前分布式项目的业务功能,通过该项目完整实现并解决分布式服务下的【分布式事务】问题☆17Apr 29, 2018Updated 8 years ago
- AI Agents on DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- Fast Lossless Color Image Compression Library☆10Jun 21, 2022Updated 3 years ago
- code for benchmarking GPU performance based on cublasSgemm and cublasHgemm☆35May 20, 2022Updated 3 years ago
- ☆28Nov 6, 2024Updated last year
- Implementation of The One Hundred Layers Tiramisu for semantic segmentation in Keras☆10Oct 23, 2018Updated 7 years ago
- A photo-sharing app with only verifiable photos and videos for professionals.☆18Jan 27, 2023Updated 3 years ago
- An implementation of our CVPR 2018 work 'Blazingly Fast Video Object Segmentation with Pixel-Wise Metric Learning'☆43Jul 12, 2019Updated 6 years ago
- RaNNC is an automatic parallelization middleware used to train very large-scale neural networks.☆57Oct 15, 2022Updated 3 years ago
- Simple script to convert a frozen tensorflow .pb file to TensorRT UFF format☆18Jul 12, 2019Updated 6 years ago
- ☆16Jan 26, 2020Updated 6 years ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- [SIGGRAPH Asia 2025] The official implementation of the paper "DvD: Unleashing a Generative Paradigm for Document Dewarping via Coordinat…☆33Mar 10, 2026Updated last month
- MobileSAM のエンコーダー/デコーダーをONNXに変換し、推論するサンプル☆12Apr 11, 2024Updated 2 years ago
- High performance Cross-platform Inference-engine, you could run Anakin on x86-cpu,arm, nv-gpu, amd-gpu,bitmain and cambricon devices.☆537Sep 23, 2022Updated 3 years ago
- ☆13Nov 7, 2021Updated 4 years ago
- Polyglot CUDA integration for the GraalVM☆18Apr 6, 2025Updated last year
- Tensorflow implemention of various GAN.☆11Mar 14, 2020Updated 6 years ago
- Sparse matrix-matrix multiplication on CPU+GPU systems.☆13Mar 17, 2014Updated 12 years ago
- This repository is for my YT video series about optimizing a Tensorflow deep learning model using TensorRT. We demonstrate optimizing LeN…☆300Jul 4, 2019Updated 6 years ago
- A memory profiler for NVIDIA GPUs to explore memory inefficiencies in GPU-accelerated applications.☆36Oct 13, 2024Updated last year
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- ngAP's artifact for ASPLOS'24☆25Jul 29, 2025Updated 9 months ago
- Code for the paper "Understanding the Role of Momentum in Stochastic Gradient Methods"☆14Oct 27, 2019Updated 6 years ago
- Generate publication-quality figures using python☆23Jun 5, 2016Updated 9 years ago
- ☆18Oct 24, 2013Updated 12 years ago
- Chainer implementation of CIFAR-10 dataset training☆12Dec 7, 2022Updated 3 years ago
- This is the implementation of our paper: Conditional Prior Networks for Optical Flow☆20Jul 15, 2019Updated 6 years ago
- ☆19Feb 5, 2021Updated 5 years ago