zenny-chen / Intel-AVX512-Brief-IntroductionLinks
Intel AVX-512简介
☆49Updated last year
Alternatives and similar repositories for Intel-AVX512-Brief-Introduction
Users that are interested in Intel-AVX512-Brief-Introduction are comparing it to the libraries listed below
Sorting:
- High performance RDMA-based distributed feature collection component for training GNN model on EXTREMELY large graph☆54Updated 2 years ago
- Example code for Intel AVX / AVX2 intrinsics.☆138Updated last year
- Stepwise optimizations of DGEMM on CPU, reaching performance faster than Intel MKL eventually, even under multithreading.☆148Updated 3 years ago
- Provides a set of benchmarks that can be used to measure the memory bandwidth performance of CPU's☆90Updated last year
- Automatic virtualization of (general) accelerators.☆43Updated 2 years ago
- CUDA PTX-ISA Document 中文翻译版☆42Updated last month
- Advanced Matrix Extensions (AMX) Guide☆92Updated 3 years ago
- RoCE v2 hardware and software implementation☆158Updated 9 months ago
- Encapsulate the frequently used AVX instructions as independent modules to reduce repeated development workload.☆122Updated last year
- GPUDirect example☆60Updated 3 years ago
- Yet another toy CPU.☆91Updated last year
- ☆11Updated last year
- Magnum IO community repo☆95Updated last month
- A repository where GPU applications are aggregated using a common build flow that supports multiple CUDA versions.☆66Updated 3 weeks ago
- C++ interfaces for RDMA access☆77Updated last week
- CPU micro benchmarks☆58Updated last week
- Mellanox libibverbs☆69Updated 5 years ago
- ☆25Updated 4 months ago
- Artifact of ASPLOS'23 paper entitled: GRACE: A Scalable Graph-Based Approach to Accelerating Recommendation Model Inference☆18Updated 2 years ago
- ☆69Updated 8 months ago
- Source code for the FAST '23 paper “MadFS: Per-File Virtualization for Userspace Persistent Memory Filesystems”☆41Updated 2 years ago
- A prefill & decode disaggregated LLM serving framework with shared GPU memory and fine-grained compute isolation.☆87Updated last month
- A user-space test platform for testing the p2pdma Linux kernel framework with NVMe CMBs and other PCIe BAR memory.☆53Updated 2 years ago
- Tacker: Tensor-CUDA Core Kernel Fusion for Improving the GPU Utilization while Ensuring QoS☆27Updated 4 months ago
- A tutorial on RDMA based programming using code examples☆39Updated 6 years ago
- Intel accelerators Zoo, like one of its solution Intel® Vector Data Streaming Library, it's a zoo of solutions based on Intel 4th Xeon pr…☆30Updated 7 months ago
- GVProf: A Value Profiler for GPU-based Clusters☆50Updated last year
- ☆36Updated 5 months ago
- hardware test for CPU,GPU,I/O,memory bandwidth performance☆25Updated 6 years ago
- Code samples related to Intel(R) AMX☆39Updated last year