aoli-al/HFuse

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/aoli-al/HFuse)

aoli-al / HFuse

Horizontal Fusion

☆24

Alternatives and similar repositories for HFuse

Users that are interested in HFuse are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

AlibabaResearch / mononn
View on GitHub
☆32Jul 17, 2024Updated 2 years ago
illinois-impact / klap
View on GitHub
A source-to-source compiler for optimizing CUDA dynamic parallelism by aggregating launches
☆15Jun 21, 2019Updated 7 years ago
guqiqi / Samoyeds
View on GitHub
Samoyeds: Accelerating MoE Models with Structured Sparsity Leveraging Sparse Tensor Cores (EuroSys'25)
☆16Jul 17, 2025Updated last year
AlibabaResearch / recom
View on GitHub
An Optimizing Compiler for Recommendation Model Inference
☆26Jun 5, 2025Updated last year
monellz / FlashTensor
View on GitHub
☆19Mar 4, 2025Updated last year
Managed Database hosting by DigitalOcean • Ad
PostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
xxcclong / GNN-Computing
View on GitHub
Artifact for PPoPP20 "Understanding and Bridging the Gaps in Current GNN Performance Optimizations"
☆42Nov 16, 2021Updated 4 years ago
Deep-Learning-Profiling-Tools / fasten
View on GitHub
☆14Apr 24, 2024Updated 2 years ago
chhzh123 / Krill
View on GitHub
An efficient concurrent graph processing system
☆46Oct 27, 2021Updated 4 years ago
Raphael-Hao / Abacus
View on GitHub
☆38Jun 27, 2025Updated last year
XiangpengHao / build-your-own-s3-select
View on GitHub
Build your own S3-Select in 400 lines of Rust!
☆14Mar 23, 2025Updated last year
GPUPeople / GPUMemManSurvey
View on GitHub
Evaluating different memory managers for dynamic GPU memory
☆26Dec 16, 2020Updated 5 years ago
wahibium / KFF
View on GitHub
Scalable GPU Kernel Fission/Fusion Transformation for Memory-Bound Kernels
☆14Aug 26, 2015Updated 10 years ago
AlibabaResearch / flash-llm
View on GitHub
Flash-LLM: Enabling Cost-Effective and Highly-Efficient Large Generative Model Inference with Unstructured Sparsity
☆246Sep 24, 2023Updated 2 years ago
SYSU-SCC / sysu-scc-spack-repo
View on GitHub
Spack package repository maintained by Student Cluster Competition Team @ Sun Yat-sen University.
☆16Aug 20, 2025Updated 11 months ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
uwsampl / SparseTIR
View on GitHub
SparseTIR: Sparse Tensor Compiler for Deep Learning
☆145Mar 31, 2023Updated 3 years ago
xiezhq-hermann / graphiler
View on GitHub
Graphiler is a compiler stack built on top of DGL and TorchScript which compiles GNNs defined using user-defined functions (UDFs) into ef…
☆59Oct 3, 2022Updated 3 years ago
yalue / cuda_scheduling_examiner_mirror
View on GitHub
A tool for examining GPU scheduling behavior.
☆96Aug 17, 2024Updated last year
KuntaiDu / vllm
View on GitHub
A high-throughput and memory-efficient inference and serving engine for LLMs
☆13Updated this week
Nelson-Cheung / yatsenos-riscv
View on GitHub
Rebuild YatSenOS On RISC-V 64.
☆23Jan 6, 2022Updated 4 years ago
summerspringwei / souffle-ae
View on GitHub
☆17Jan 24, 2024Updated 2 years ago
bigwater / gpunfa-artifact
View on GitHub
☆19Nov 21, 2022Updated 3 years ago
OSU-STARLAB / UVM_benchmark
View on GitHub
☆34Sep 9, 2020Updated 5 years ago
PAA-NCIC / GSWITCH
View on GitHub
A pattern-based algorithmic autotuner for graph processing on GPUs.
☆33Jun 25, 2025Updated last year
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
zhaiyi000 / tlm
View on GitHub
☆49Jul 13, 2024Updated 2 years ago
SJTU-IPADS / PipeLLM
View on GitHub
☆28Dec 22, 2024Updated last year
HPMLL / ZipServ_ASPLOS26
View on GitHub
☆52Dec 19, 2025Updated 7 months ago
pku-liang / AMOS
View on GitHub
Automatic Mapping Generation, Verification, and Exploration for ISA-based Spatial Accelerators
☆125Oct 26, 2022Updated 3 years ago
PSAL-POSTECH / accelsim_HMS
View on GitHub
☆12Jul 2, 2024Updated 2 years ago
wu-kan / GoPTX
View on GitHub
GoPTX: Fine-grained GPU Kernel Fusion by PTX-level Instruction Flow Weaving
☆21Jul 30, 2025Updated 11 months ago
GATECH-EIC / GCoD
View on GitHub
[HPCA 2022] GCoD: Graph Convolutional Network Acceleration via Dedicated Algorithm and Accelerator Co-Design
☆38Mar 30, 2022Updated 4 years ago
redbird-arch / isca2025-chimera-artifact
View on GitHub
Artifact of Chimera
☆18May 6, 2025Updated last year
getianao / ngAP
View on GitHub
ngAP's artifact for ASPLOS'24
☆25Jul 29, 2025Updated 11 months ago
AI Agents on DigitalOcean Gradient AI Platform • Ad
Build production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
caoting-dotcom / multiBranchModel
View on GitHub
Multi-branch model for concurrent execution
☆18Jun 27, 2023Updated 3 years ago
hpcgarage / cuASR
View on GitHub
cuASR: CUDA Algebra for Semirings
☆50Aug 22, 2022Updated 3 years ago
YukeWang96 / MGG_OSDI23
View on GitHub
Artifact for OSDI'23: MGG: Accelerating Graph Neural Networks with Fine-grained intra-kernel Communication-Computation Pipelining on Mult…
☆40Mar 17, 2024Updated 2 years ago
XiangpengHao / seen
View on GitHub
Knowledge management for the impatient
☆26Mar 12, 2025Updated last year
sjtu-epcc / Tacker
View on GitHub
Tacker: Tensor-CUDA Core Kernel Fusion for Improving the GPU Utilization while Ensuring QoS
☆33Feb 10, 2025Updated last year
cowanmeg / cgo-artifact-2020
View on GitHub
Artifact repository for paper Automatic Generation of High-Performance Quantized Machine Learning Kernels
☆17Oct 13, 2020Updated 5 years ago
mi150 / VaLoRA
View on GitHub
☆11May 19, 2025Updated last year