ChengZhang-98 / llm-mixed-q
mixed-precision quantization for LLMs
☆14Updated last year
Related projects ⓘ
Alternatives and complementary repositories for llm-mixed-q
- Implementation of Microscaling data formats in SystemVerilog.☆12Updated 2 months ago
- ☆80Updated last year
- ☆20Updated this week
- MICRO22 artifact evaluation for Sparseloop☆39Updated 2 years ago
- Simulator for BitFusion☆92Updated 4 years ago
- Multi-core HW accelerator mapping optimization framework for layer-fused ML workloads.☆40Updated this week
- ☆22Updated last year
- Tender: Accelerating Large Language Models via Tensor Decompostion and Runtime Requantization (ISCA'24)☆12Updated 4 months ago
- A co-design architecture on sparse attention☆44Updated 3 years ago
- Neural Network Quantization With Fractional Bit-widths☆12Updated 3 years ago
- Codebase for ICML'24 paper: Learning from Students: Applying t-Distributions to Explore Accurate and Efficient Formats for LLMs☆24Updated 5 months ago
- ☆34Updated 4 months ago
- ☆25Updated 3 years ago
- ☆41Updated 3 years ago
- ☆24Updated 8 months ago
- ☆30Updated 4 years ago
- Official implementation of "Searching for Winograd-aware Quantized Networks" (MLSys'20)☆27Updated last year
- ☆87Updated 4 months ago
- ☆31Updated 3 years ago
- Linux docker for the DNN accelerator exploration infrastructure composed of Accelergy and Timeloop☆46Updated 2 weeks ago
- A reference implementation of the Mind Mappings Framework.☆28Updated 2 years ago
- ☆16Updated 2 years ago
- Torch2Chip (MLSys, 2024)☆51Updated 2 months ago
- A framework for fast exploration of the depth-first scheduling space for DNN accelerators☆32Updated last year
- [FPGA 2024] Source code and bitstream for LevelST: Stream-based Accelerator for Sparse Triangular Solver☆11Updated 10 months ago
- [ICASSP'20] DNN-Chip Predictor: An Analytical Performance Predictor for DNN Accelerators with Various Dataflows and Hardware Architecture…☆22Updated 2 years ago
- ☆20Updated 2 years ago
- RTL implementation of Flex-DPE.☆91Updated 4 years ago
- ☆19Updated 3 years ago
- BSQ: Exploring Bit-Level Sparsity for Mixed-Precision Neural Network Quantization (ICLR 2021)☆36Updated 3 years ago