clevercool / TileSparsity
☆140Updated 3 years ago
Related projects: ⓘ
- ☆120Updated 3 years ago
- Official Implementation of "Accel-GNN: High-Performance GPU Accelerator Design for Graph Neural Networks"☆67Updated last year
- ☆71Updated this week
- SQuant [ICLR22]☆158Updated last year
- Official implementation of "MaxK-GNN: Extremely Fast GPU Kernel Design for Accelerating Graph Neural Networks Training"☆50Updated 6 months ago
- ☆34Updated this week
- MIXQ: Taming Dynamic Outliers in Mixed-Precision Quantization by Online Prediction☆43Updated 2 weeks ago
- EDC20: Code repository for the auto_navigation_car based on stm32. Contributed by the team A_star(champion team of the 20th Tsinghua Univ…☆19Updated 5 years ago
- [NeurIPS22] "Advancing Model Pruning via Bi-level Optimization" by Yihua Zhang*, Yuguang Yao*, Parikshit Ram, Pu Zhao, Tianlong Chen, Min…☆141Updated last year
- QAT(quantize aware training) for classification with MQBench☆35Updated 2 years ago
- Pruning Filter in Filter(NeurIPS2020)☆167Updated 6 months ago
- 使用deepspeed从头开始训练一个LLM,经过pretrain和sft阶段,验证llm学习知识、理解语言、回答问题的能力☆145Updated 2 months ago
- The Official Implementation of PyramidKV: Dynamic KV Cache Compression based on Pyramidal Information Funneling☆480Updated last month
- YiTu is an easy-to-use runtime to fully exploit the hybrid parallelism of different hardwares (e.g., GPU) to efficiently support the exec…☆350Updated 4 months ago
- ☆33Updated 2 years ago
- GAL-DAWN: An Novel High performance computing Library of Graph Algorithms based on DAWN, CUDA/C++☆116Updated last month
- Secure Transformer Inference is a protocol for serving Transformer-based models securely.☆112Updated 4 months ago
- A Tiny structure of pytorch for learning; 一个最小pytorch的实现☆46Updated 2 months ago
- ☆11Updated last year
- LeNet5 on PYNQ via HLS☆39Updated last year
- 以jax为后端的类似keras的框架☆122Updated last year
- [NeurIPS-2022] Efficient gRaph sImilarity Computation with Alignment Regularization☆35Updated last year
- UniInst☆131Updated 8 months ago
- Algorithm acceleration landing framework, let you complete the development of algorithm at low cost.eg: Facedetect, FaceLandmark..☆90Updated 3 years ago
- Demo for testing dynamically load the libos module.☆12Updated 10 months ago
- 【grps接入trtllm】通过接入TensorRT-LLM以及Tokenizers.cpp实现纯c++版本高性能LLM服务,兼容OpenAI接口协议,支持chat和function call模式。☆40Updated 2 weeks ago
- ☆32Updated last year
- EffiBench: Benchmarking the Efficiency of Automatically Generated Code☆50Updated last month
- [ICSE 2022] Explaining ML-powered Code Generation byReferring to Training Examples.☆47Updated 10 months ago
- An I/O-Efficient Disk-based Graph System for Scalable Second-Order RandomWalk of Large Graphs☆25Updated 2 years ago