Tacker: Tensor-CUDA Core Kernel Fusion for Improving the GPU Utilization while Ensuring QoS
☆34Feb 10, 2025Updated last year
Alternatives and similar repositories for Tacker
Users that are interested in Tacker are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- GoPTX: Fine-grained GPU Kernel Fusion by PTX-level Instruction Flow Weaving☆20Jul 30, 2025Updated 7 months ago
- libsmctrl论文的复现,添加了python端接口,可以在python端灵活调用接口来分配计算资源☆12May 21, 2024Updated last year
- ☆18Mar 4, 2025Updated last year
- ☆12May 24, 2022Updated 3 years ago
- Implementation of Hyena Hierarchy in JAX☆10Apr 30, 2023Updated 2 years ago
- A source-to-source compiler for optimizing CUDA dynamic parallelism by aggregating launches☆15Jun 21, 2019Updated 6 years ago
- DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling☆21Updated this week
- [NeurIPS 2025] ClusterFusion: Expanding Operator Fusion Scope for LLM Inference via Cluster-Level Collective Primitive☆66Dec 11, 2025Updated 3 months ago
- MAGIS: Memory Optimization via Coordinated Graph Transformation and Scheduling for DNN (ASPLOS'24)☆56May 29, 2024Updated last year
- FTPipe and related pipeline model parallelism research.☆44May 16, 2023Updated 2 years ago
- Open deep learning compiler stack for cpu, gpu and specialized accelerators☆19Updated this week
- Tutorials of Extending and importing TVM with CMAKE Include dependency.☆15Oct 11, 2024Updated last year
- ☆118Nov 17, 2023Updated 2 years ago
- ☆17Jan 24, 2024Updated 2 years ago
- TiledLower is a Dataflow Analysis and Codegen Framework written in Rust.☆13Nov 23, 2024Updated last year
- ☆38Jun 27, 2025Updated 8 months ago
- ☆72Jun 23, 2025Updated 9 months ago
- Paella: Low-latency Model Serving with Virtualized GPU Scheduling☆70May 1, 2024Updated last year
- ☆20Sep 28, 2024Updated last year
- Artifacts for SOSP'19 paper Optimizing Deep Learning Computation with Automatic Generation of Graph Substitutions☆21Apr 15, 2022Updated 3 years ago
- POC implementation of "Accelerating HE Operations Using Key Decomposition"[KLSS23]☆18Jun 11, 2025Updated 9 months ago
- TiledKernel is a code generation library based on macro kernels and memory hierarchy graph data structure.☆19May 12, 2024Updated last year
- Handwritten GEMM using Intel AMX (Advanced Matrix Extension)☆17Jan 11, 2025Updated last year
- ☆26Feb 20, 2024Updated 2 years ago
- Depict GPU memory footprint during DNN training of PyTorch☆11Nov 17, 2022Updated 3 years ago
- ☆78May 4, 2021Updated 4 years ago
- PipeSwitch: Fast Pipelined Context Switching for Deep Learning Applications☆127May 9, 2022Updated 3 years ago
- Horizontal Fusion☆24Jan 7, 2022Updated 4 years ago
- TileFusion is an experimental C++ macro kernel template library that elevates the abstraction level in CUDA C for tile processing.☆106Jun 28, 2025Updated 8 months ago
- ☆19Aug 26, 2021Updated 4 years ago
- Artifact for PPoPP22 QGTC: Accelerating Quantized GNN via GPU Tensor Core.☆30Feb 12, 2022Updated 4 years ago
- ngAP's artifact for ASPLOS'24☆26Jul 29, 2025Updated 7 months ago
- Open-source Framework for HPCA2024 paper: Gemini: Mapping and Architecture Co-exploration for Large-scale DNN Chiplet Accelerators☆111Apr 28, 2025Updated 10 months ago
- Framework to reduce autotune overhead to zero for well known deployments.☆97Sep 19, 2025Updated 6 months ago
- Linux docker for the DNN accelerator exploration infrastructure composed of Accelergy and Timeloop☆65Oct 14, 2025Updated 5 months ago
- ☆11Apr 16, 2023Updated 2 years ago
- Artifacts for our ASPLOS'23 paper ElasticFlow☆56May 10, 2024Updated last year
- ☆13Nov 1, 2021Updated 4 years ago
- ☆33Jul 17, 2024Updated last year