zenny-chen / Intel-AVX512-Brief-IntroductionLinks
Intel AVX-512简介
☆49Updated last year
Alternatives and similar repositories for Intel-AVX512-Brief-Introduction
Users that are interested in Intel-AVX512-Brief-Introduction are comparing it to the libraries listed below
Sorting:
- High performance RDMA-based distributed feature collection component for training GNN model on EXTREMELY large graph☆54Updated 2 years ago
- Encapsulate the frequently used AVX instructions as independent modules to reduce repeated development workload.☆121Updated last year
- rdma编程学习☆26Updated 3 years ago
- example code for using DC QP for providing RDMA READ and WRITE operations to remote GPU memory☆131Updated 10 months ago
- Example code for Intel AVX / AVX2 intrinsics.☆138Updated last year
- https://github.com/dendibakh/perf-book gitbook在线电子书,翻译成中文原始markdown文档☆88Updated 5 months ago
- RoCE v2 hardware and software implementation☆155Updated 8 months ago
- C++ interfaces for RDMA access☆77Updated last week
- CUDA PTX-ISA Document 中文翻译版☆42Updated last week
- Provides a set of benchmarks that can be used to measure the memory bandwidth performance of CPU's☆89Updated last year
- ☆35Updated 3 years ago
- Mellanox libibverbs☆68Updated 5 years ago
- CPU micro benchmarks☆57Updated last week
- Automatic virtualization of (general) accelerators.☆44Updated 2 years ago
- Stepwise optimizations of DGEMM on CPU, reaching performance faster than Intel MKL eventually, even under multithreading.☆148Updated 3 years ago
- Artifact of ASPLOS'23 paper entitled: GRACE: A Scalable Graph-Based Approach to Accelerating Recommendation Model Inference☆18Updated 2 years ago
- ☆21Updated last week
- 分层解耦的深度学习推理引擎☆73Updated 3 months ago
- ☆182Updated 2 years ago
- An unofficial cuda assembler, for all generations of SASS, hopefully :)☆83Updated 2 years ago
- ☆68Updated 7 months ago
- A highly-flexible GPU simulator for AMD GPUs.☆151Updated this week
- Yet another toy CPU.☆91Updated last year
- A user-space test platform for testing the p2pdma Linux kernel framework with NVMe CMBs and other PCIe BAR memory.☆53Updated 2 years ago
- ☆25Updated 3 months ago
- A repository where GPU applications are aggregated using a common build flow that supports multiple CUDA versions.☆65Updated last week
- Magnum IO community repo☆95Updated 2 weeks ago
- GPUDirect example☆60Updated 3 years ago
- Accel-config / libaccel-config☆66Updated last month
- Dissecting NVIDIA GPU Architecture☆95Updated 2 years ago