cornell-zhang / llm-datatypes
Codebase for ICML'24 paper: Learning from Students: Applying t-Distributions to Explore Accurate and Efficient Formats for LLMs
☆24Updated 4 months ago
Related projects ⓘ
Alternatives and complementary repositories for llm-datatypes
- ArkVale: Efficient Generative LLM Inference with Recallable Key-Value Eviction (NIPS'24)☆16Updated 2 weeks ago
- QuickEst repository: Quick Estimation of Quality of Results☆26Updated 6 years ago
- mixed-precision quantization for LLMs☆13Updated last year
- Repository for artifact evaluation of ASPLOS 2023 paper "SparseTIR: Composable Abstractions for Sparse Compilation in Deep Learning"☆23Updated last year
- ☆81Updated 4 months ago
- Tender: Accelerating Large Language Models via Tensor Decompostion and Runtime Requantization (ISCA'24)☆12Updated 4 months ago
- ☆11Updated 3 years ago
- ☆22Updated last year
- [HPCA'24] Smart-Infinity: Fast Large Language Model Training using Near-Storage Processing on a Real System☆34Updated 8 months ago
- DOSA: Differentiable Model-Based One-Loop Search for DNN Accelerators☆13Updated 3 weeks ago
- ☆15Updated 2 years ago
- ☆79Updated 11 months ago
- [HPCA'21] SpAtten: Efficient Sparse Attention Architecture with Cascade Token and Head Pruning☆75Updated 2 months ago
- Graph-learning assisted instruction vulnerability estimation published in DATE 2020☆13Updated 3 years ago
- ☆9Updated 5 months ago
- ☆25Updated 3 years ago
- PIM-DL: Expanding the Applicability of Commodity DRAM-PIMs for Deep Learning via Algorithm-System Co-Optimization☆25Updated 8 months ago
- ☆20Updated this week
- ☆18Updated 4 months ago
- ☆38Updated 7 months ago
- A graph linear algebra overlay☆49Updated last year
- Hop-Wise Graph Attention for Scalable and Generalizable Learning on Circuits☆21Updated 2 months ago
- BSQ: Exploring Bit-Level Sparsity for Mixed-Precision Neural Network Quantization (ICLR 2021)☆36Updated 3 years ago
- MICRO22 artifact evaluation for Sparseloop☆38Updated 2 years ago
- ACM TODAES Best Paper Award, 2022☆24Updated last year
- Serpens is an HBM FPGA accelerator for SpMV☆14Updated 3 months ago
- A Generic Distributed Auto-Tuning Infrastructure☆21Updated 3 years ago
- ☆18Updated 2 years ago
- ☆31Updated 3 years ago
- TidalDecode: A Fast and Accurate LLM Decoding with Position Persistent Sparse Attention☆21Updated last month