XiaoSongXS/dgemm-knl

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/XiaoSongXS/dgemm-knl)

XiaoSongXS / dgemm-knl

DGEMM on KNL, achieve 75% MKL

☆18

Alternatives and similar repositories for dgemm-knl

Users that are interested in dgemm-knl are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

XiaoSongXS / CUDA-Optimization-Guide
View on GitHub
Xiao's CUDA Optimization Guide [NO LONGER ADDING NEW CONTENT]
☆328Nov 8, 2022Updated 3 years ago
2horse9sun / ucb_sp20_cs152_lab
View on GitHub
UC Berkeley CS152 Computer Architecture and Engineering Labs
☆27Jun 17, 2020Updated 6 years ago
ironartisan / awesome-compression1
View on GitHub
模型压缩的小白入门教程
☆22Jul 7, 2024Updated 2 years ago
uwsampl / sparsetir-artifact
View on GitHub
Repository for artifact evaluation of ASPLOS 2023 paper "SparseTIR: Composable Abstractions for Sparse Compilation in Deep Learning"
☆25Feb 24, 2023Updated 3 years ago
xiaoyu1998 / llvm-cpu0
View on GitHub
LLVM Backend tutorial Cpu0
☆27Nov 5, 2023Updated 2 years ago
Bare Metal GPUs on DigitalOcean Gradient AI • Ad
Purpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
KuangjuX / cu-x
View on GitHub
🎉My Collections of CUDA Kernels~
☆11Jun 25, 2024Updated 2 years ago
AlexwellChen / Toy_ML_Framework
View on GitHub
☆11May 16, 2026Updated 2 months ago
JiangLiSJTU / token-ring
View on GitHub
☆13Jan 7, 2025Updated last year
xlite-dev / HGEMM
View on GitHub
⚡️Write HGEMM from scratch using Tensor Cores with WMMA, MMA and CuTe API, Achieve Peak⚡️ Performance.
☆157May 10, 2025Updated last year
piDack / The-ans-for-Programming-Massively-Parallel-Processor
View on GitHub
大规模并行处理器编程实战第二版答案
☆36Jun 4, 2022Updated 4 years ago
yuxianzhi / Top-K
View on GitHub
A way to use cuda to accelerate top k algorithm
☆30Jul 11, 2017Updated 9 years ago
leaderit / graphql-postgres-template
View on GitHub
Template for GraphQL API based on PostgreSQL database server, Hasura GraphQL engine, Redis memory cache, Fastify nodejs server for custom…
☆10Mar 3, 2021Updated 5 years ago
ZhW-loop / UniCoMo
View on GitHub
☆13Sep 19, 2024Updated last year
openebs-archive / spdk-sys
View on GitHub
Rust bindings for SPDK
☆12Mar 5, 2020Updated 6 years ago
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
linghaosong / Serpens
View on GitHub
An HBM FPGA based SpMV Accelerator
☆19Aug 29, 2024Updated last year
bigconvience / LLVM-Essentials-13
View on GitHub
learn javassist by example
☆30Dec 14, 2021Updated 4 years ago
sam-astro / NN-2
View on GitHub
More advanced Unity Neural Network, built from scratch in C#
☆16May 15, 2023Updated 3 years ago
openmlsys / openmlsys-cuda
View on GitHub
Tutorials for writing high-performance GPU operators in AI frameworks.
☆135Aug 12, 2023Updated 2 years ago
CASR-HKU / DPACS
View on GitHub
☆20Mar 21, 2023Updated 3 years ago
guanlisheng / infobright-4.0.7
View on GitHub
☆15Feb 1, 2016Updated 10 years ago
AlbertoParravicini / approximate-spmv-topk
View on GitHub
Public repostory for the DAC 2021 paper "Scaling up HBM Efficiency of Top-K SpMV forApproximate Embedding Similarity on FPGAs"
☆16Aug 29, 2021Updated 4 years ago
NVIDIA / clara-ia
View on GitHub
CUDA accelerated medical imaging algorithms
☆16May 9, 2022Updated 4 years ago
KlassnayaAfrodita / AfroditaMQ
View on GitHub
AfroditaMQ is a high-performance, asynchronous message broker designed for scalable and reliable message delivery. This broker supports e…
☆18Nov 21, 2024Updated last year
Managed Database hosting by DigitalOcean • Ad
PostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
Liu-xiandong / How_to_optimize_in_GPU
View on GitHub
This is a series of GPU optimization topics. Here we will introduce how to optimize the CUDA kernel in detail. I will introduce several…
☆1,335Jul 29, 2023Updated 3 years ago
mirror12k / llm-shell
View on GitHub
Utility to integrate ChatGPT (or other LLMs) into your shell.
☆15Mar 29, 2025Updated last year
leimao / Nsight-Compute-Docker-Image
View on GitHub
Nsight Compute In Docker
☆13Dec 21, 2023Updated 2 years ago
yzhaiustc / Optimizing-SGEMM-on-NVIDIA-Turing-GPUs
View on GitHub
Optimizing SGEMM kernel functions on NVIDIA GPUs to a close-to-cuBLAS performance.
☆420Jan 2, 2025Updated last year
NamanMakkar / ECE5545-ML-Hardware-Systems
View on GitHub
This repo contains the Assignments from Cornell Tech's ECE 5545 - Machine Learning Hardware and Systems offered in Spring 2023
☆44May 31, 2023Updated 3 years ago
hky1999 / Unishyper
View on GitHub
A Rust-based Unikernel Enhancing Reliability and Efficiency of Embedded Systems.
☆12Jun 28, 2024Updated 2 years ago
SakuraILU / NJU-ICS2021-PA
View on GitHub
NJU ICS课程的PA实验，非常棒的一个大项目，受益匪浅！一栈式打通虚拟机NEMU、操作系统NLiteOS和应用层
☆51Aug 3, 2022Updated 3 years ago
dt-3t / Transformer-en-to-cn
View on GitHub
使用Transformer进行中英翻译（demo）
☆17Aug 25, 2023Updated 2 years ago
sillycross / Leiserchess---MIT-6.172-Fall16-Final-Project
View on GitHub
A fast implementation of Leiserchess AI for MIT 6.172`16 http://scrimmage.csail.mit.edu/
☆12Dec 22, 2016Updated 9 years ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
yanqiangmiffy / tree2retriever
View on GitHub
Recursive Abstractive Processing for Tree-Organized Retrieval
☆10May 30, 2024Updated 2 years ago
JoyZadan / shop-kbeauty
View on GitHub
A multi-brand, ecommerce full stack project built using Django, Python and JavaScript, deployed to Heroku, uses Amazon S3 for cloud stora…
☆11Feb 9, 2023Updated 3 years ago
SJTU-ReArch-Group / Paper-Reading-List
View on GitHub
☆154Updated this week
sen-ye / linux-clash
View on GitHub
☆10Nov 14, 2023Updated 2 years ago
gem5-hpca-2024 / gem5
View on GitHub
☆10Mar 3, 2024Updated 2 years ago
ElvisCheny / CUDA_C-Code
View on GitHub
CUDA_C编程权威指南示例代码
☆13Mar 22, 2023Updated 3 years ago
weishengying / cutlass_flash_atten_fp8
View on GitHub
使用 cutlass 仓库在 ada 架构上实现 fp8 的 flash attention
☆82Aug 12, 2024Updated last year