Code for testing the native float16 matrix multiplication performance on Tesla P100 and V100 GPU based on cublasHgemm
☆35Aug 20, 2019Updated 6 years ago
Alternatives and similar repositories for cublasHgemm-P100
Users that are interested in cublasHgemm-P100 are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- A FASTQ lossless compression algorithm especially designed for nanopore sequencing FASTQ files.☆10Jul 2, 2020Updated 6 years ago
- Tensorflow model export from Python to C++ and inference without using TF library☆17Mar 13, 2019Updated 7 years ago
- C++ CPU inference library for Tensorflow object detection models based on the lightweight Tensorflow C-API.☆15Jun 26, 2018Updated 8 years ago
- outline and links for PLDI 2022 tutorial☆17Jun 13, 2022Updated 4 years ago
- HCC Sample Applications☆13Jan 3, 2017Updated 9 years ago
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- [MLSys 2023] Pre-train and Search: Efficient Embedding Table Sharding with Pre-trained Neural Cost Models☆16May 5, 2023Updated 3 years ago
- ☆14Jan 12, 2022Updated 4 years ago
- A NetWork Generate Names, Based On Conditional RNN, Set Condition And Generate Different Names.☆12May 15, 2017Updated 9 years ago
- A copy of the DirectX Headers from MinGW-64.☆14Sep 7, 2023Updated 2 years ago
- ☆12Dec 2, 2014Updated 11 years ago
- YOLOv3-training-prune☆58Mar 9, 2021Updated 5 years ago
- PlayStation1 MDEC compression tools☆11Dec 31, 2020Updated 5 years ago
- An extension of deeplab-v2 (in TF) allowing for smoothed dilated convolutions☆12Mar 27, 2019Updated 7 years ago
- Benchmark scripts for comparing tutorials in PyTorch and JAX☆14Aug 25, 2022Updated 3 years ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- Provides a vendored libjxl.☆16Oct 13, 2022Updated 3 years ago
- Simple Qt OpenGL SVG rendering benchmark☆15Sep 18, 2011Updated 14 years ago
- Implementation of All-Frequency Shadows Using Non-linear Wavelet Lighting Approximation by Ren Ng et al.☆11Jul 14, 2019Updated 6 years ago
- Robust Real-time Object Detection for the Nao Robots☆19May 18, 2022Updated 4 years ago
- Implementation of The One Hundred Layers Tiramisu for semantic segmentation in Keras☆10Oct 23, 2018Updated 7 years ago
- implementation of relationNet naive version☆12Dec 4, 2017Updated 8 years ago
- USD build script for aarch64 target☆11Oct 28, 2022Updated 3 years ago
- An implementation of our CVPR 2018 work 'Blazingly Fast Video Object Segmentation with Pixel-Wise Metric Learning'☆43Jul 12, 2019Updated 6 years ago
- ☆28Apr 9, 2025Updated last year
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- Simple script to convert a frozen tensorflow .pb file to TensorRT UFF format☆18Jul 12, 2019Updated 6 years ago
- [SIGGRAPH Asia 2025] The official implementation of the paper "DvD: Unleashing a Generative Paradigm for Document Dewarping via Coordinat…☆33Mar 10, 2026Updated 3 months ago
- 研究生slam的杂项☆13Oct 31, 2020Updated 5 years ago
- Dialogue Graph Modeling for Conversational Machine Reading (ACL 2021, Findings)☆18Nov 29, 2022Updated 3 years ago
- Rust implementation of k-d tree to efficiently perform color quantization to predefined sets☆13Feb 14, 2018Updated 8 years ago
- MobileSAM のエンコーダー/デコーダーをONNXに変換し、推論するサンプル☆12Apr 11, 2024Updated 2 years ago
- High performance Cross-platform Inference-engine, you could run Anakin on x86-cpu,arm, nv-gpu, amd-gpu,bitmain and cambricon devices.☆538Sep 23, 2022Updated 3 years ago
- Source for our HPG Paper "CPU-Style SIMD Ray Traversal on GPUs"☆15Aug 31, 2018Updated 7 years ago
- Spherical Harmonics library inspired by D3DX☆16Jan 23, 2012Updated 14 years ago
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- ☆13Mar 2, 2021Updated 5 years ago
- 用Paddle复现论文ChineseBERT: Chinese Pretraining Enhanced by Glyph and Pinyin Information(ACL2021)☆10Nov 15, 2021Updated 4 years ago
- This repository is for my YT video series about optimizing a Tensorflow deep learning model using TensorRT. We demonstrate optimizing LeN…☆300Jul 4, 2019Updated 6 years ago
- OpenCL path tracer written in Python☆16Oct 16, 2016Updated 9 years ago
- Video classification using convGRU☆13Feb 15, 2018Updated 8 years ago
- Tornado Web Server git repository for OpenShift with Python 3.3☆15Dec 13, 2015Updated 10 years ago
- Implementation of our CVPR2019 paper on Depth Completion: Dense Depth Posterior (DDP) from Single Image and Sparse Range☆17Mar 30, 2019Updated 7 years ago