☆19Aug 26, 2021Updated 4 years ago
Alternatives and similar repositories for APNN-TC
Users that are interested in APNN-TC are comparing it to the libraries listed below
Sorting:
- A framework for pipelined computing on GPU☆30Jul 17, 2019Updated 6 years ago
- ☆36Jul 25, 2022Updated 3 years ago
- Artifact for PPoPP20 "Understanding and Bridging the Gaps in Current GNN Performance Optimizations"☆41Nov 16, 2021Updated 4 years ago
- ☆50Jun 27, 2019Updated 6 years ago
- Scalable GPU Kernel Fission/Fusion Transformation for Memory-Bound Kernels☆14Aug 26, 2015Updated 10 years ago
- Artifact for PPoPP22 QGTC: Accelerating Quantized GNN via GPU Tensor Core.☆30Feb 12, 2022Updated 4 years ago
- Post-training sparsity-aware quantization☆34Feb 26, 2023Updated 3 years ago
- ☆13Jan 23, 2021Updated 5 years ago
- The official implementation of BiViT: Extremely Compressed Binary Vision Transformers☆16Jun 18, 2023Updated 2 years ago
- Experiments evaluating preemption on the NVIDIA Pascal architecture☆17Nov 10, 2016Updated 9 years ago
- Getting Starting with NIMBUS-CORE☆10Dec 16, 2023Updated 2 years ago
- ☆23Jan 7, 2022Updated 4 years ago
- ☆19Jul 1, 2020Updated 5 years ago
- torch_quantizer is a out-of-box quantization tool for PyTorch models on CUDA backend, specially optimized for Diffusion Models.☆24Mar 29, 2024Updated last year
- ngAP's artifact for ASPLOS'24☆25Jul 29, 2025Updated 7 months ago
- ☆22Feb 18, 2025Updated last year
- High performance RDMA-based distributed feature collection component for training GNN model on EXTREMELY large graph☆56Jul 3, 2022Updated 3 years ago
- Implementation of the Winograd algorithm.☆24Nov 6, 2018Updated 7 years ago
- ☆26Aug 19, 2022Updated 3 years ago
- [ICLR 2024] This is the official PyTorch implementation of "QLLM: Accurate and Efficient Low-Bitwidth Quantization for Large Language Mod…☆31Mar 12, 2024Updated last year
- this is the release repository of superneurons☆54Feb 13, 2021Updated 5 years ago
- A library of GPU kernels for sparse matrix operations.☆283Nov 24, 2020Updated 5 years ago
- Tacker: Tensor-CUDA Core Kernel Fusion for Improving the GPU Utilization while Ensuring QoS☆34Feb 10, 2025Updated last year
- ☆112Jul 3, 2021Updated 4 years ago
- Prefetching and efficient data path for memory disaggregation☆69Jul 16, 2020Updated 5 years ago
- Collections of model quantization algorithms. Any issues, please contact Peng Chen (blueardour@gmail.com)☆73Oct 7, 2021Updated 4 years ago
- Efficient Sparse-Winograd Convolutional Neural Networks (ICLR 2018)☆193May 7, 2019Updated 6 years ago
- A Python implementation of the Hopfield network used to solve the traveling salesman problem☆10Apr 11, 2019Updated 6 years ago
- Convolutional Channel-wise Competitive Learning for the Forward-Forward Algorithm. AAAI 2024☆11Jun 27, 2024Updated last year
- HPA2021 solution (3rd place)☆10Oct 13, 2021Updated 4 years ago
- The SEAL-CPU backend is a Reference backend engine for HEBench which is a shared library that implements the required functions specified…☆11Mar 3, 2023Updated 3 years ago
- Code for pre-training CharacterBERT models (as well as BERT models).☆34Sep 6, 2021Updated 4 years ago
- FPGA and GPU acceleration of LeNet5☆35Jul 9, 2019Updated 6 years ago
- ☆41Aug 12, 2024Updated last year
- TLB Benchmarks☆35Sep 11, 2017Updated 8 years ago
- Fine-grained GPU sharing primitives☆148Jul 28, 2025Updated 7 months ago
- Dorylus: Affordable, Scalable, and Accurate GNN Training☆76May 31, 2021Updated 4 years ago
- pytorch fixed point training tool/framework☆34Oct 14, 2020Updated 5 years ago
- Implementation of TSM2L and TSM2R -- High-Performance Tall-and-Skinny Matrix-Matrix Multiplication Algorithms for CUDA☆35Jul 28, 2020Updated 5 years ago