modified cutlass
☆15Oct 26, 2020Updated 5 years ago
Alternatives and similar repositories for cutlass-bak
Users that are interested in cutlass-bak are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- ☆16Sep 24, 2024Updated last year
- Implementation of TSM2L and TSM2R -- High-Performance Tall-and-Skinny Matrix-Matrix Multiplication Algorithms for CUDA☆35Jul 28, 2020Updated 5 years ago
- CUDA Templates for Linear Algebra Subroutines☆102Apr 25, 2024Updated last year
- A unified programming framework for high and portable performance across FPGAs and GPUs☆11Mar 23, 2025Updated last year
- Machine Intelligence Shader Autogen. AMDGPU ML shader code generator. (previously iGEMMgen)☆37Jul 30, 2025Updated 8 months ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click and start building anything your business needs.
- Handy tools & graphics API abstraction for blazing fast prototyping☆10Jan 17, 2024Updated 2 years ago
- study of cutlass☆22Nov 10, 2024Updated last year
- HeteroHalide: From Image Processing DSL to Efficient FPGA Acceleration☆15Sep 14, 2020Updated 5 years ago
- The (open-source part of) code to reproduce "BPPSA: Scaling Back-propagation by Parallel Scan Algorithm".☆13Jun 7, 2021Updated 4 years ago
- SParse AcceleRation on Tensor Architecture☆18Apr 7, 2025Updated last year
- Stanford CS231n Convolutional Neural Networks for Visual Recognition Assignments☆11Aug 5, 2017Updated 8 years ago
- cuASR: CUDA Algebra for Semirings☆45Aug 22, 2022Updated 3 years ago
- LLVM/MLIR based compiler instrumentation of AMD GPU kernels☆20Jul 13, 2025Updated 8 months ago
- Homework solutions to 2017 Fall Algorithm Courses in ShanghaiTech☆10Jan 5, 2018Updated 8 years ago
- DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- GEMM and Winograd based convolutions using CUTLASS☆28Jul 15, 2020Updated 5 years ago
- The Next-gen Language & Compiler Powering Efficient Hardware Design☆36Jan 16, 2025Updated last year
- An object detection codebase based on MegEngine.☆28Dec 14, 2022Updated 3 years ago
- Model-less Inference Serving☆94Nov 4, 2023Updated 2 years ago
- World's first Nintendo 3DS emulator for Apple devices based on Citra.☆18Apr 7, 2023Updated 3 years ago
- ☆15Dec 16, 2021Updated 4 years ago
- Polyite: Iterative Schedule Optimization for Parallelization in the Polyhedron Model☆12Jan 19, 2020Updated 6 years ago
- OriGen: Enhancing RTL Code Generation with Code-to-Code Augmentation and Self-Reflection(ICCAD 2024)☆29Oct 20, 2024Updated last year
- Thallium is a C++14 library wrapping Margo, Mercury, and Argobots and providing an object-oriented way to use these libraries.☆14Apr 3, 2026Updated last week
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- ☆15Apr 15, 2022Updated 3 years ago
- PilotFish harvests the free GPU cycles of cloud gaming with deep learning training☆14Jul 2, 2022Updated 3 years ago
- Multiplication using AVX512 and AVX512IFMA instructions☆23Nov 9, 2015Updated 10 years ago
- Tool for inferring cache replacement policies with automata learning. Uses LearnLib and Sketch.☆16Apr 21, 2020Updated 5 years ago
- Using Feature Decomposition method to accelerate GNN inference☆13Sep 27, 2021Updated 4 years ago
- ☆13Dec 9, 2024Updated last year
- Subpart source code of of deepcore v0.7☆27Jun 28, 2020Updated 5 years ago
- Make a new landmarks or a less points landmarks from dlib ibug_300W_large_face_landmark_dataset.☆10Nov 11, 2018Updated 7 years ago
- Chameleon: Adaptive Code Optimization for Expedited Deep Neural Network Compilation☆27Nov 7, 2019Updated 6 years ago
- Proton VPN Special Offer - Get 70% off • AdSpecial partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
- ☆14Mar 8, 2025Updated last year
- Graph-based and Transition-based dependency parsers based on BiLSTMs☆18Mar 15, 2021Updated 5 years ago
- ncnn export & infer mobileclip☆21Aug 18, 2025Updated 7 months ago
- Argus is a novel RDMA-assisted job scheduler which achieves high resource utilization by fully exploiting the structure feature of stage …☆10Apr 13, 2021Updated 4 years ago
- ☆15Aug 28, 2025Updated 7 months ago
- TileFlow is a performance analysis tool based on Timeloop for fusion dataflows☆67Apr 12, 2024Updated last year
- JAX interpreter for Vulkan☆16Jun 1, 2021Updated 4 years ago