modified cutlass
☆15Oct 26, 2020Updated 5 years ago
Alternatives and similar repositories for cutlass-bak
Users that are interested in cutlass-bak are comparing it to the libraries listed below
Sorting:
- ☆16Sep 24, 2024Updated last year
- Implementation of TSM2L and TSM2R -- High-Performance Tall-and-Skinny Matrix-Matrix Multiplication Algorithms for CUDA☆35Jul 28, 2020Updated 5 years ago
- CUDA Templates for Linear Algebra Subroutines☆102Apr 25, 2024Updated last year
- A unified programming framework for high and portable performance across FPGAs and GPUs☆11Mar 23, 2025Updated 11 months ago
- Utilities for paper writing.☆12Jan 11, 2026Updated 2 months ago
- Machine Intelligence Shader Autogen. AMDGPU ML shader code generator. (previously iGEMMgen)☆37Jul 30, 2025Updated 7 months ago
- study of cutlass☆22Nov 10, 2024Updated last year
- The (open-source part of) code to reproduce "BPPSA: Scaling Back-propagation by Parallel Scan Algorithm".☆13Jun 7, 2021Updated 4 years ago
- Stanford CS231n Convolutional Neural Networks for Visual Recognition Assignments☆11Aug 5, 2017Updated 8 years ago
- ☆20Sep 28, 2024Updated last year
- cuASR: CUDA Algebra for Semirings☆45Aug 22, 2022Updated 3 years ago
- TiledKernel is a code generation library based on macro kernels and memory hierarchy graph data structure.☆19May 12, 2024Updated last year
- GEMM and Winograd based convolutions using CUTLASS☆28Jul 15, 2020Updated 5 years ago
- The Next-gen Language & Compiler Powering Efficient Hardware Design☆36Jan 16, 2025Updated last year
- Model-less Inference Serving☆94Nov 4, 2023Updated 2 years ago
- A new QR decomposition algorithm implemented in CUDA☆18Jun 24, 2024Updated last year
- World's first Nintendo 3DS emulator for Apple devices based on Citra.☆18Apr 7, 2023Updated 2 years ago
- Polyite: Iterative Schedule Optimization for Parallelization in the Polyhedron Model☆12Jan 19, 2020Updated 6 years ago
- OriGen: Enhancing RTL Code Generation with Code-to-Code Augmentation and Self-Reflection(ICCAD 2024)☆29Oct 20, 2024Updated last year
- ☆19Oct 29, 2025Updated 4 months ago
- ☆16Nov 2, 2022Updated 3 years ago
- ☆15Apr 15, 2022Updated 3 years ago
- PilotFish harvests the free GPU cycles of cloud gaming with deep learning training☆14Jul 2, 2022Updated 3 years ago
- MegEngine Documentations☆44Jan 15, 2021Updated 5 years ago
- Tool for inferring cache replacement policies with automata learning. Uses LearnLib and Sketch.☆16Apr 21, 2020Updated 5 years ago
- Using Feature Decomposition method to accelerate GNN inference☆13Sep 27, 2021Updated 4 years ago
- ☆13Dec 9, 2024Updated last year
- 北京大学本科生毕业论文 latex 模版,基于 pkuthss 1.9.0 修改☆27May 15, 2022Updated 3 years ago
- Make a new landmarks or a less points landmarks from dlib ibug_300W_large_face_landmark_dataset.☆10Nov 11, 2018Updated 7 years ago
- Chameleon: Adaptive Code Optimization for Expedited Deep Neural Network Compilation☆27Nov 7, 2019Updated 6 years ago
- ☆14Mar 8, 2025Updated last year
- The code for IJCAI 2019 paper "Deep Cascade Generation on Point Sets"☆14Oct 3, 2023Updated 2 years ago
- ncnn export & infer mobileclip☆21Aug 18, 2025Updated 7 months ago
- Argus is a novel RDMA-assisted job scheduler which achieves high resource utilization by fully exploiting the structure feature of stage …☆10Apr 13, 2021Updated 4 years ago
- Artifact for PPoPP20 "Understanding and Bridging the Gaps in Current GNN Performance Optimizations"☆41Nov 16, 2021Updated 4 years ago
- TileFlow is a performance analysis tool based on Timeloop for fusion dataflows☆67Apr 12, 2024Updated last year
- ☆19Aug 26, 2021Updated 4 years ago
- PKU CompNet'19 Lab 2 - Homebrew TCP☆12Nov 29, 2019Updated 6 years ago
- ☆20Jan 12, 2022Updated 4 years ago