JieRen98 / SGEMM-SASS-AnnotationView external linksLinks
☆21Mar 22, 2021Updated 4 years ago
Alternatives and similar repositories for SGEMM-SASS-Annotation
Users that are interested in SGEMM-SASS-Annotation are comparing it to the libraries listed below
Sorting:
- ☆17Nov 22, 2025Updated 2 months ago
- standalone boost preprocessor library☆13Apr 16, 2013Updated 12 years ago
- Super fast accurate face detector ! SCRFD(CVPR 2021) with MNN/TNN/NCNN/ONNXRuntime C++.☆18Jan 12, 2022Updated 4 years ago
- Whisper in TensorRT-LLM☆17Sep 21, 2023Updated 2 years ago
- Matrix Multiply-Accumulate with CUDA and WMMA( Tensor Core)☆145Aug 18, 2020Updated 5 years ago
- A simple high performance CUDA GEMM implementation.☆426Jan 4, 2024Updated 2 years ago
- PipeInfer: Accelerating LLM Inference using Asynchronous Pipelined Speculation☆32Nov 16, 2024Updated last year
- Several optimization methods of half-precision general matrix vector multiplication (HGEMV) using CUDA core.☆72Sep 8, 2024Updated last year
- This is a repository to practice multi-thread programming in C++☆28Feb 21, 2024Updated last year
- Optimize GEMM with tensorcore step by step☆36Dec 17, 2023Updated 2 years ago
- The repository contains a reference end-to-end pipeline for a real-time video analytics application. Realtime data is provided to an infe…☆11Nov 3, 2025Updated 3 months ago
- ☆54May 5, 2025Updated 9 months ago
- ☆97Mar 26, 2025Updated 10 months ago
- using pvanet framework train mobilenet-v2 for objects detection, papaer: https://arxiv.org/abs/1611.08588☆13Feb 13, 2019Updated 7 years ago
- ☆49Apr 15, 2024Updated last year
- ☆10Dec 31, 2018Updated 7 years ago
- Algorithms of facial recognition through sketches☆13May 9, 2014Updated 11 years ago
- Official repository for the paper Local Linear Attention: An Optimal Interpolation of Linear and Softmax Attention For Test-Time Regressi…☆23Oct 1, 2025Updated 4 months ago
- Protocol buffers and other common resources.☆13Jan 20, 2026Updated 3 weeks ago
- ☆10Aug 16, 2021Updated 4 years ago
- 使用ONNXRuntime部署anchor-free系列的YOLOR,包含C++和Python两种版本的程序☆41Sep 18, 2021Updated 4 years ago
- ☆54Mar 15, 2025Updated 10 months ago
- [ACL 2024] RelayAttention for Efficient Large Language Model Serving with Long System Prompts☆40Feb 29, 2024Updated last year
- ☆105Sep 9, 2024Updated last year
- 使用 cutlass 实现 flash-attention 精简版,具有教学意义☆56Aug 12, 2024Updated last year
- GEMM☆10Aug 26, 2023Updated 2 years ago
- ☆10May 24, 2020Updated 5 years ago
- ANDROID APP to AUTO GENERATE SUBTITLE FILE and TRANSLATED SUBTITLE FILE (using unofficial online Google Translate API) for any audio/vide…☆19May 5, 2024Updated last year
- A std::execution style runtime context and High Performance RPC Transport for using OpenUCX. Including CUDA/ROCM/... devices with RDMA.☆29Updated this week
- Base on emapgo(易图通) HDmap services, getting map message to build decision order on ROS system.☆10Sep 24, 2020Updated 5 years ago
- ☆11Sep 21, 2022Updated 3 years ago
- Created a simple neural network using C++17 standard and the Eigen library that supports both forward and backward propagation.☆10Jul 27, 2024Updated last year
- The C++ matting code is based on BackgroundMattingV2 and RobustVideoMatting.☆11Nov 20, 2021Updated 4 years ago
- CenterNet3D 部署版本,便于移植不同平台(onnx、tensorRT、rknn、Horizon)。☆13May 24, 2024Updated last year
- Tools for checking if code is ready for python3☆10Sep 18, 2020Updated 5 years ago
- Multi-heap-sort for many small arrays, quicksort with 3 pivots for one big array, CUDA acceleration, CUDA memory compression.☆13Sep 29, 2024Updated last year
- ☆12May 12, 2017Updated 8 years ago
- Implementation of joint bayesian model, written in python.☆11Aug 2, 2021Updated 4 years ago
- ☆14Sep 12, 2024Updated last year