HaoKang-Timmy / torchanalyse
A pytorch model profiler with information about macs, energy and e.t.c
☆12Updated 6 months ago
Related projects: ⓘ
- A version of XRBench-MAESTRO used for MLSys 2023 publication☆22Updated last year
- The official implementation of the DAC 2024 paper GQA-LUT☆10Updated last week
- BitPack is a practical tool to efficiently save ultra-low precision/mixed-precision quantized models.☆49Updated last year
- FRAME: Fast Roofline Analytical Modeling and Estimation☆28Updated 11 months ago
- HW/SW co-design of sentence-level energy optimizations for latency-aware multi-task NLP inference☆46Updated 5 months ago
- ☆19Updated 5 months ago
- PyTorch compilation tutorial covering TorchScript, torch.fx, and Slapo☆18Updated last year
- Repository for artifact evaluation of ASPLOS 2023 paper "SparseTIR: Composable Abstractions for Sparse Compilation in Deep Learning"☆23Updated last year
- Flexible simulator for mixed precision and format simulation of LLMs and vision transformers.☆42Updated last year
- [HPCA'21] SpAtten: Efficient Sparse Attention Architecture with Cascade Token and Head Pruning☆64Updated 3 weeks ago
- PyTorch extension for emulating FP8 data formats on standard FP32 Xeon/GPU hardware.☆98Updated 9 months ago
- ☆127Updated last month
- mixed-precision quantization for LLMs☆12Updated 10 months ago
- [FPGA'21] CoDeNet is an efficient object detection model on PyTorch, with SOTA performance on VOC and COCO based on CenterNet and Co-Desi…☆25Updated last year
- You Only Search Once: On Lightweight Differentiable Architecture Search for Resource-Constrained Embedded Platforms☆10Updated last year
- SparseTIR: Sparse Tensor Compiler for Deep Learning☆129Updated last year
- The official code for DATE'23 paper <CLAP: Locality Aware and Parallel Triangle Counting with Content Addressable Memory>☆20Updated last month
- Codebase for ICML'24 paper: Learning from Students: Applying t-Distributions to Explore Accurate and Efficient Formats for LLMs☆21Updated 2 months ago
- Official Repo for "LLM-PQ: Serving LLM on Heterogeneous Clusters with Phase-Aware Partition and Adaptive Quantization"☆25Updated 6 months ago
- ☆10Updated last week
- [HPCA 2023] ViTCoD: Vision Transformer Acceleration via Dedicated Algorithm and Accelerator Co-Design☆88Updated last year
- ☆44Updated 3 years ago
- ☆39Updated last year
- List of papers related to Vision Transformers quantization and hardware acceleration in recent AI conferences and journals.☆47Updated 3 months ago
- Code Repository of Evaluating Quantized Large Language Models☆89Updated last week
- Multi-Instance-GPU profiling tool☆51Updated last year
- A Vectorized N:M Format for Unleashing the Power of Sparse Tensor Cores☆31Updated 9 months ago
- [ICML 2023] This project is the official implementation of our accepted ICML 2023 paper BiBench: Benchmarking and Analyzing Network Binar…☆54Updated 6 months ago
- PyTorch-Based Fast and Efficient Processing for Various Machine Learning Applications with Diverse Sparsity☆95Updated last month
- Linux docker for the DNN accelerator exploration infrastructure composed of Accelergy and Timeloop☆41Updated 3 months ago