kooyunmo / cuda-uvm-gpt2Links

PyTorch-UVM on super-large language models.

☆17

Alternatives and similar repositories for cuda-uvm-gpt2

Users that are interested in cuda-uvm-gpt2 are comparing it to the libraries listed below

Sorting:

casys-kaist / HUVM
☆24Updated 3 years ago
platformxlab / G10
☆40Updated 2 years ago
OSU-STARLAB / UVM_benchmark
☆30Updated 5 years ago
GVProf / GVProf
GVProf: A Value Profiler for GPU-based Clusters
☆52Updated last year
kooyunmo / pytorch-uvm
Tensors and Dynamic neural networks in Python with strong GPU acceleration
☆15Updated 4 years ago
tallendev / uvm-eval
This serves as a repository for reproducibility of the SC21 paper "In-Depth Analyses of Unified Virtual Memory System for GPU Accelerated…
☆35Updated 2 years ago
c3sr / tcu_scope
☆50Updated 6 years ago
AIS-SNU / PID-Comm
☆27Updated 10 months ago
YukeWang96 / MGG_OSDI23
Artifact for OSDI'23: MGG: Accelerating Graph Neural Networks with Fine-grained intra-kernel Communication-Computation Pipelining on Mult…
☆40Updated last year
SNU-ARC / flashneuron
☆39Updated 2 years ago
ParCIS / Magicube
Magicube is a high-performance library for quantized sparse matrix operations (SpMM and SDDMM) of deep learning on Tensor Cores.
☆89Updated 2 years ago
DebashisGanguly / gpgpu-sim_UVMSmart
☆78Updated 4 years ago
SNU-ARC / MERCI
☆18Updated 4 years ago
sitar-lab / NeuSight
☆53Updated 4 months ago
parasailteam / coconet
☆83Updated 2 years ago
jeongminpark417 / GIDS
☆37Updated 4 months ago
jiazhihao / ROC
Distributed Multi-GPU GNN Framework
☆36Updated 5 years ago
apuaaChen / vectorSparse
☆32Updated 3 years ago
YukeWang96 / GNNAdvisor_OSDI21
Artifact for OSDI'21 GNNAdvisor: An Adaptive and Efficient Runtime System for GNN Acceleration on GPUs.
☆66Updated 2 years ago
YukeWang96 / TC-GNN_ATC23
Artifact for USENIX ATC'23: TC-GNN: Bridging Sparse GNN Computation and Dense Tensor Cores on GPUs.
☆50Updated 2 years ago
owensgroup / merge-spmm
Code for paper "Design Principles for Sparse Matrix Multiplication on the GPU" accepted to Euro-Par 2018
☆73Updated 5 years ago
PSAL-POSTECH / M2NDP-public
A Cycle-level simulator for M2NDP
☆31Updated 2 months ago
CMU-SAFARI / Mosaic
Source code of the simulator used in the Mosaic paper from MICRO 2017: "Mosaic: A GPU Memory Manager with Application-Transparent Support…
☆49Updated 7 years ago
Sys-KU / DeepPlan
[ACM EuroSys 2023] Fast and Efficient Model Serving Using Multi-GPUs with Direct-Host-Access
☆57Updated 2 months ago
Raphael-Hao / Abacus
☆38Updated 3 months ago
utcs-scea / altis
A benchmarking suite for heterogeneous systems. The primary goal of this project is to improve and update aspects of existing benchmarkin…
☆42Updated last year
codyjrivera / tsm2x-imp
Implementation of TSM2L and TSM2R -- High-Performance Tall-and-Skinny Matrix-Matrix Multiplication Algorithms for CUDA
☆35Updated 5 years ago
mutinifni / splitwise-sim
LLM serving cluster simulator
☆116Updated last year
ucare-uchicago / ev-store-dlrm
☆31Updated last year
BoyuanFeng / APNN-TC
☆19Updated 4 years ago