clevercool / TileSparsity
☆140Updated 3 years ago
Related projects ⓘ
Alternatives and complementary repositories for TileSparsity
- ☆120Updated 3 years ago
- Official Implementation of "Accel-GNN: High-Performance GPU Accelerator Design for Graph Neural Networks"☆67Updated last year
- SQuant [ICLR22]☆158Updated 2 years ago
- Official implementation of "MaxK-GNN: Extremely Fast GPU Kernel Design for Accelerating Graph Neural Networks Training"☆50Updated 8 months ago
- An acceleration library that supports arbitrary bit-width combinatorial quantization operations☆226Updated last month
- Mixed precision inference by Tensorrt-LLM☆93Updated 3 weeks ago
- Support mixed-precsion inference with vllm☆95Updated 2 weeks ago
- MIXQ: Taming Dynamic Outliers in Mixed-Precision Quantization by Online Prediction☆81Updated 3 weeks ago
- Build CUDA Neural Network From Scratch☆19Updated 2 months ago
- ShadowKV: KV Cache in Shadows for High-Throughput Long-Context LLM Inference☆124Updated 3 weeks ago
- Unified KV Cache Compression Methods for LLMs☆728Updated this week
- QAT(quantize aware training) for classification with MQBench☆35Updated 3 years ago
- GAL-DAWN: An Novel High performance computing Library of Graph Algorithms based on DAWN, CUDA/C++☆115Updated 3 months ago
- [NeurIPS22] "Advancing Model Pruning via Bi-level Optimization" by Yihua Zhang*, Yuguang Yao*, Parikshit Ram, Pu Zhao, Tianlong Chen, Min…☆141Updated last year
- ☆32Updated last year
- EDC20: Code repository for the auto_navigation_car based on stm32. Contributed by the team A_star(champion team of the 20th Tsinghua Univ…☆19Updated 5 years ago
- [ECCV 2024] Efficient Inference of Vision Instruction-Following Models with Elastic Cache☆46Updated 3 months ago
- Pruning Filter in Filter(NeurIPS2020)☆167Updated 8 months ago
- HeFlwr: Federated Learning for Heterogeneous Devices☆130Updated last month
- Chiron: The M-to-N Deployment System for Serverless Workflow☆11Updated last year
- LeNet5 on PYNQ via HLS☆39Updated last year
- Official Code for "SVD-LLM: Truncation-aware Singular Value Decomposition for Large Language Model Compression"☆95Updated last month
- UniInst☆131Updated 10 months ago
- [COLM 2024] TriForce: Lossless Acceleration of Long Sequence Generation with Hierarchical Speculative Decoding☆230Updated 2 months ago
- 【grps接入trtllm】通过GPRS+TensorRT-LLM+Tokenizers.cpp实现纯C++版高性能OpenAI LLM服务,支持chat和function call模式,支持ai agent,支持分布式多卡推理,支持多模态,支持gradio聊天界面。☆92Updated 2 weeks ago
- Visual Analysis of Metropolitan-Scale Sparse Trajectories☆39Updated 4 years ago
- ☆177Updated last month
- ☆33Updated 2 years ago
- This repository contains the core methods and models described in the paper “Represent Code as Action Sequence for Predicting Next Method…☆73Updated 2 months ago