Tacker: Tensor-CUDA Core Kernel Fusion for Improving the GPU Utilization while Ensuring QoS
☆33Feb 10, 2025Updated last year
Alternatives and similar repositories for Tacker
Users that are interested in Tacker are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- GoPTX: Fine-grained GPU Kernel Fusion by PTX-level Instruction Flow Weaving☆20Jul 30, 2025Updated 10 months ago
- libsmctrl论文的复现,添加了python端接口,可以在python端灵活调用接口来分配计算资源☆12May 21, 2024Updated 2 years ago
- ☆19Mar 4, 2025Updated last year
- ☆12May 24, 2022Updated 4 years ago
- Implementation of Hyena Hierarchy in JAX☆10Apr 30, 2023Updated 3 years ago
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- A source-to-source compiler for optimizing CUDA dynamic parallelism by aggregating launches☆15Jun 21, 2019Updated 6 years ago
- DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling☆25Updated this week
- [NeurIPS 2025] ClusterFusion: Expanding Operator Fusion Scope for LLM Inference via Cluster-Level Collective Primitive☆72Dec 11, 2025Updated 6 months ago
- MAGIS: Memory Optimization via Coordinated Graph Transformation and Scheduling for DNN (ASPLOS'24)☆57May 29, 2024Updated 2 years ago
- FTPipe and related pipeline model parallelism research.☆44May 16, 2023Updated 3 years ago
- Open deep learning compiler stack for cpu, gpu and specialized accelerators☆19Updated this week
- Tutorials of Extending and importing TVM with CMAKE Include dependency.☆16Oct 11, 2024Updated last year
- ☆122Nov 17, 2023Updated 2 years ago
- ☆17Jan 24, 2024Updated 2 years ago
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- TiledLower is a Dataflow Analysis and Codegen Framework written in Rust.☆13Nov 23, 2024Updated last year
- ☆38Jun 27, 2025Updated 11 months ago
- Jetson embedded platform-target deep learning inference acceleration framework with TensorRT☆30Oct 10, 2025Updated 8 months ago
- Paella: Low-latency Model Serving with Virtualized GPU Scheduling☆71May 1, 2024Updated 2 years ago
- POC implementation of "Accelerating HE Operations Using Key Decomposition"[KLSS23]☆19Jun 11, 2025Updated last year
- Artifacts for SOSP'19 paper Optimizing Deep Learning Computation with Automatic Generation of Graph Substitutions☆21Apr 15, 2022Updated 4 years ago
- TiledKernel is a code generation library based on macro kernels and memory hierarchy graph data structure.☆19May 12, 2024Updated 2 years ago
- ☆82Jun 23, 2025Updated 11 months ago
- Handwritten GEMM using Intel AMX (Advanced Matrix Extension)☆17Jan 11, 2025Updated last year
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- ☆26Feb 20, 2024Updated 2 years ago
- Depict GPU memory footprint during DNN training of PyTorch☆11Nov 17, 2022Updated 3 years ago
- ☆78May 4, 2021Updated 5 years ago
- PipeSwitch: Fast Pipelined Context Switching for Deep Learning Applications☆127May 9, 2022Updated 4 years ago
- ☆20Aug 26, 2021Updated 4 years ago
- Artifact for PPoPP22 QGTC: Accelerating Quantized GNN via GPU Tensor Core.☆30Feb 12, 2022Updated 4 years ago
- TileFusion is an experimental C++ macro kernel template library that elevates the abstraction level in CUDA C for tile processing.☆110Jun 28, 2025Updated 11 months ago
- ngAP's artifact for ASPLOS'24☆25Jul 29, 2025Updated 10 months ago
- Open-source Framework for HPCA2024 paper: Gemini: Mapping and Architecture Co-exploration for Large-scale DNN Chiplet Accelerators☆114Apr 28, 2025Updated last year
- Deploy open-source AI quickly and easily - Special Bonus Offer • AdRunpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
- Framework to reduce autotune overhead to zero for well known deployments.☆101Sep 19, 2025Updated 8 months ago
- Tahoe: Tree Structure-Aware High Performance Inference Engine for Decision Tree Ensemble on GPU☆22Aug 29, 2024Updated last year
- Linux docker for the DNN accelerator exploration infrastructure composed of Accelergy and Timeloop☆68Oct 14, 2025Updated 7 months ago
- Artifacts for our ASPLOS'23 paper ElasticFlow☆56May 10, 2024Updated 2 years ago
- ☆11Apr 16, 2023Updated 3 years ago
- ☆32Jul 17, 2024Updated last year
- Create tiny ML systems for on-device learning.☆19Jul 14, 2021Updated 4 years ago