Bartanakin / BartaEngine
☆12Updated this week
Related projects ⓘ
Alternatives and complementary repositories for BartaEngine
- Source code for the CPU-Free model - a fully autonomous execution model for multi-GPU applications that completely excludes the involveme…☆16Updated 6 months ago
- Batch-system simulation for elastic workloads☆11Updated last month
- Source code of the paper "OpSparse: a Highly Optimized Framework for Sparse General Matrix Multiplication on GPUs"☆11Updated 2 years ago
- ☆27Updated 3 years ago
- Skeleton CMake project that integrates Google Tests☆14Updated last year
- CUDA Learning guide☆243Updated 4 months ago
- GPU Static Modeling using PTX and Deep Structured Learning☆17Updated 4 years ago
- SYCL Academy, a set of learning materials for SYCL heterogeneous programming☆454Updated this week
- Efficient Distributed GPU Programming for Exascale, an SC/ISC Tutorial☆181Updated last week
- CUDA Core Compute Libraries☆1,252Updated this week
- Header-Only C++ Library for Graph Representation and Algorithms☆468Updated last month
- CMake for C++ Best Practices☆1,130Updated 3 months ago
- Examples demonstrating available options to program multiple GPUs in a single node or a cluster☆555Updated last week
- Slides and other materials from CppCon 2023☆287Updated 8 months ago
- Source code of the IPDPS '21 paper: "TileSpMV: A Tiled Algorithm for Sparse Matrix-Vector Multiplication on GPUs" by Yuyao Niu, Zhengyang…☆9Updated 2 years ago
- Example Makefile for CUDA and C++ source files in a standard project layout.☆47Updated 6 years ago
- C++ library for reading and writing of numpy's .npy files☆373Updated last month
- GPU programming related news and material links☆1,216Updated last month
- CUDA Library Samples☆1,606Updated last week
- Training materials associated with NVIDIA's CUDA Training Series (www.olcf.ornl.gov/cuda-training-series/)☆607Updated 2 months ago
- Mirror of http://gitlab.hpcrl.cse.ohio-state.edu/chong/ppopp19_ae, refactoring for understanding☆12Updated 3 years ago
- AlpaServe: Statistical Multiplexing with Model Parallelism for Deep Learning Serving (OSDI 23)☆79Updated last year
- Some CUDA projects and utility☆28Updated 5 years ago
- Performance Prediction Toolkit☆51Updated 2 years ago
- Read custom dataset☆11Updated last year
- Fast CUDA matrix multiplication from scratch☆473Updated 10 months ago
- ☆485Updated this week
- Caliper is an instrumentation and performance profiling library☆352Updated this week
- LLM Inference analyzer for different hardware platforms☆42Updated this week
- Parallel SpMV using CSR representation, built in CUDA☆12Updated 4 years ago