GustavoStahl / CASSLinks
CASS: Nvidia to AMD Transpilation with Data, Models, and Benchmark
☆24Updated last month
Alternatives and similar repositories for CASS
Users that are interested in CASS are comparing it to the libraries listed below
Sorting:
- MAGIS: Memory Optimization via Coordinated Graph Transformation and Scheduling for DNN (ASPLOS'24)☆53Updated last year
- WaferLLM: Large Language Model Inference at Wafer Scale☆42Updated last month
- ASPLOS'24: Optimal Kernel Orchestration for Tensor Programs with Korch☆38Updated 4 months ago
- TileFlow is a performance analysis tool based on Timeloop for fusion dataflows☆61Updated last year
- HyFiSS: A Hybrid Fidelity Stall-Aware Simulator for GPGPUs☆36Updated 8 months ago
- Artifacts of EVT ASPLOS'24☆26Updated last year
- ☆51Updated 2 months ago
- ☆27Updated last month
- Artifact evaluation of PLDI'24 paper "Allo: A Programming Model for Composable Accelerator Design"☆28Updated last year
- LLM Inference analyzer for different hardware platforms☆83Updated last month
- ☆19Updated 10 months ago
- Tile-based language built for AI computation across all scales☆34Updated last week
- SpInfer: Leveraging Low-Level Sparsity for Efficient Large Language Model Inference on GPUs☆51Updated 4 months ago
- Horizontal Fusion☆25Updated 3 years ago
- ☆26Updated 4 months ago
- Automatic Mapping Generation, Verification, and Exploration for ISA-based Spatial Accelerators☆114Updated 2 years ago
- H2-LLM: Hardware-Dataflow Co-Exploration for Heterogeneous Hybrid-Bonding-based Low-Batch LLM Inference☆46Updated 3 months ago
- EQueue Dialect☆40Updated 3 years ago
- ☆38Updated 3 years ago
- Canvas: End-to-End Kernel Architecture Search in Neural Networks☆27Updated 8 months ago
- Artifact for paper "PIM is All You Need: A CXL-Enabled GPU-Free System for LLM Inference", ASPLOS 2025☆83Updated 3 months ago
- ☆175Updated last year
- ☆27Updated last year
- An analytical framework that models hardware dataflow of tensor applications on spatial architectures using the relation-centric notation…☆86Updated last year
- ☆17Updated 4 months ago
- Asynchronous semantics for architectural simulation and synthesis.☆44Updated this week
- Code release for AdapMoE accepted by ICCAD 2024☆30Updated 3 months ago
- OSDI 2023 Welder, deeplearning compiler☆21Updated last year
- ☆28Updated 2 years ago
- Open source RTL implementation of Tensor Core, Sparse Tensor Core, BitWave and SparSynergy in the article: "SparSynergy: Unlocking Flexib…☆17Updated 4 months ago