anilshanbhag / gpu-topkLinks

Efficient Top-K implementation on the GPU

☆179

Alternatives and similar repositories for gpu-topk

Users that are interested in gpu-topk are comparing it to the libraries listed below

Sorting:

sjfeng1999 / gpu-arch-microbenchmark
Dissecting NVIDIA GPU Architecture
☆97Updated 2 years ago
daadaada / turingas
Assembler for NVIDIA Volta and Turing GPUs
☆222Updated 3 years ago
linnanwang / superneurons-release
this is the release repository of superneurons
☆52Updated 4 years ago
Yinghan-Li / YHs_Sample
Yinghan's Code Sample
☆335Updated 2 years ago
codyjrivera / tsm2x-imp
Implementation of TSM2L and TSM2R -- High-Performance Tall-and-Skinny Matrix-Matrix Multiplication Algorithms for CUDA
☆32Updated 4 years ago
njuhope / cuda_sgemm
☆113Updated last year
thu-pacman / PET
PET: Optimizing Tensor Programs with Partially Equivalent Transformations and Automated Corrections
☆121Updated 3 years ago
yzhaiustc / Optimizing-SGEMM-on-NVIDIA-Turing-GPUs
Optimizing SGEMM kernel functions on NVIDIA GPUs to a close-to-cuBLAS performance.
☆357Updated 5 months ago
wmmae / wmma_extension
An extension library of WMMA API (Tensor Core API)
☆99Updated 11 months ago
owensgroup / merge-spmm
Code for paper "Design Principles for Sparse Matrix Multiplication on the GPU" accepted to Euro-Par 2018
☆72Updated 4 years ago
reed-lau / cute-gemm
☆123Updated 6 months ago
pku-liang / FlexTensor
Automatic Schedule Exploration and Optimization Framework for Tensor Computations
☆176Updated 3 years ago
sunlex0717 / DissectingTensorCores
☆98Updated last year
TiledTensor / TiledCUDA
We invite you to visit and follow our new repository at https://github.com/microsoft/TileFusion. TiledCUDA is a highly efficient kernel …
☆183Updated 5 months ago
nicolaswilde / cuda-tensorcore-hgemm
☆146Updated 6 months ago
mit-han-lab / inter-operator-scheduler
[MLSys 2021] IOS: Inter-Operator Scheduler for CNN Acceleration
☆200Updated 3 years ago
NVIDIA-Merlin / HierarchicalKV
HierarchicalKV is a part of NVIDIA Merlin and provides hierarchical key-value storage to meet RecSys requirements. The key capability of…
☆152Updated last week
google-research / sputnik
A library of GPU kernels for sparse matrix operations.
☆265Updated 4 years ago
apache / tvm-rfcs
A home for the final text of all TVM RFCs.
☆105Updated 9 months ago
MARD1NO / CUDA-PPT
☆97Updated 2 months ago
NVlabs / cub
THIS REPOSITORY HAS MOVED TO github.com/nvidia/cub, WHICH IS AUTOMATICALLY MIRRORED HERE.
☆84Updated last year
owensgroup / SlabHash
A warp-oriented dynamic hash table for GPUs
☆73Updated last year
lixiuhong / batched_gemm
☆39Updated 5 years ago
Cjkkkk / CUDA_gemm
A simple high performance CUDA GEMM implementation.
☆382Updated last year
mmperf / mmperf
MatMul Performance Benchmarks for a Single CPU Core comparing both hand engineered and codegen kernels.
☆133Updated last year
NVIDIA / online-softmax
Benchmark code for the "Online normalizer calculation for softmax" paper
☆94Updated 6 years ago
dmlc / nnvm-fusion
Kernel Fusion and Runtime Compilation Based on NNVM
☆70Updated 8 years ago
XiuYuLi / deepcore_source_code
Subpart source code of of deepcore v0.7
☆27Updated 5 years ago
masahi / tvm-cutlass-eval
☆40Updated 3 years ago
ekondis / gpumembench
A GPU benchmark suite for assessing on-chip GPU memory bandwidth
☆105Updated 7 years ago