JieRen98 / SGEMM-SASS-AnnotationView external linksLinks
☆21Mar 22, 2021Updated 4 years ago
Alternatives and similar repositories for SGEMM-SASS-Annotation
Users that are interested in SGEMM-SASS-Annotation are comparing it to the libraries listed below
Sorting:
- ☆17Nov 22, 2025Updated 2 months ago
- standalone boost preprocessor library☆13Apr 16, 2013Updated 12 years ago
- Super fast accurate face detector ! SCRFD(CVPR 2021) with MNN/TNN/NCNN/ONNXRuntime C++.☆18Jan 12, 2022Updated 4 years ago
- Whisper in TensorRT-LLM☆17Sep 21, 2023Updated 2 years ago
- Manages vllm-nccl dependency☆17Jun 3, 2024Updated last year
- NVIDIA TensorRT Hackathon 2023复赛选题:通义千问Qwen-7B用TensorRT-LLM模型搭建及优化☆43Oct 20, 2023Updated 2 years ago
- A practical way of learning Swizzle☆36Feb 3, 2025Updated last year
- Matrix Multiply-Accumulate with CUDA and WMMA( Tensor Core)☆145Aug 18, 2020Updated 5 years ago
- A simple high performance CUDA GEMM implementation.☆426Jan 4, 2024Updated 2 years ago
- Several optimization methods of half-precision general matrix vector multiplication (HGEMV) using CUDA core.☆72Sep 8, 2024Updated last year
- Implementation of IceFormer: Accelerated Inference with Long-Sequence Transformers on CPUs (ICLR 2024).☆25Jul 15, 2025Updated 6 months ago
- ☆34Feb 3, 2025Updated last year
- The repository contains a reference end-to-end pipeline for a real-time video analytics application. Realtime data is provided to an infe…☆11Nov 3, 2025Updated 3 months ago
- All Resources from Stanford CS106B 2021☆23Jul 11, 2025Updated 7 months ago
- ☆97Mar 26, 2025Updated 10 months ago
- ☆10Dec 31, 2018Updated 7 years ago
- ☆10Aug 16, 2021Updated 4 years ago
- Protocol buffers and other common resources.☆13Jan 20, 2026Updated 3 weeks ago
- Algorithms of facial recognition through sketches☆13May 9, 2014Updated 11 years ago
- [ACL 2024] RelayAttention for Efficient Large Language Model Serving with Long System Prompts☆40Feb 29, 2024Updated last year
- 机器学习编译 陈天奇☆53Jan 1, 2023Updated 3 years ago
- [ICML 2025] SliM-LLM: Salience-Driven Mixed-Precision Quantization for Large Language Models☆51Aug 9, 2024Updated last year
- A standalone GEMM kernel for fp16 activation and quantized weight, extracted from FasterTransformer☆96Sep 13, 2025Updated 5 months ago
- ☆105Sep 9, 2024Updated last year
- A std::execution style runtime context and High Performance RPC Transport for using OpenUCX. Including CUDA/ROCM/... devices with RDMA.☆29Updated this week
- 🚀 LLM inference optimization simulator, modeling compute-bound prefill and memory-bound decode phases.☆13Jul 12, 2025Updated 7 months ago
- ☆12Jan 19, 2020Updated 6 years ago
- CenterNet3D 部署版本,便于移植不同平台(onnx、tensorRT、rknn、Horizon)。☆13May 24, 2024Updated last year
- Base on emapgo(易图通) HDmap services, getting map message to build decision order on ROS system.☆10Sep 24, 2020Updated 5 years ago
- Model explanation provides the ability to interpret the effect of the predictors on the composition of an individual score.☆13Jan 21, 2021Updated 5 years ago
- [ICML 2025] Efficiently Serving Large Multimodal Models Using EPD Disaggregation☆22May 29, 2025Updated 8 months ago
- ☆11Jan 25, 2021Updated 5 years ago
- DTLC-GAN Tensorflow☆12Aug 29, 2018Updated 7 years ago
- use yolov3 onnx model to implement object detection☆11Apr 25, 2019Updated 6 years ago
- ☆15Dec 1, 2023Updated 2 years ago
- ☆11Sep 21, 2022Updated 3 years ago
- Created a simple neural network using C++17 standard and the Eigen library that supports both forward and backward propagation.☆10Jul 27, 2024Updated last year
- KWANT is an open source C++ toolkit for computing scores and other metrics for object tracking systems.☆11Jan 22, 2026Updated 3 weeks ago
- The C++ matting code is based on BackgroundMattingV2 and RobustVideoMatting.☆11Nov 20, 2021Updated 4 years ago