☆21Mar 22, 2021Updated 4 years ago
Alternatives and similar repositories for SGEMM-SASS-Annotation
Users that are interested in SGEMM-SASS-Annotation are comparing it to the libraries listed below
Sorting:
- ☆13Jan 7, 2025Updated last year
- ☆17Nov 22, 2025Updated 3 months ago
- standalone boost preprocessor library☆13Apr 16, 2013Updated 12 years ago
- Whisper in TensorRT-LLM☆17Sep 21, 2023Updated 2 years ago
- Super fast accurate face detector ! SCRFD(CVPR 2021) with MNN/TNN/NCNN/ONNXRuntime C++.☆19Jan 12, 2022Updated 4 years ago
- Manages vllm-nccl dependency☆17Jun 3, 2024Updated last year
- NVIDIA TensorRT Hackathon 2023复赛选题:通义千问Qwen-7B用TensorRT-LLM模型搭建及优化☆43Oct 20, 2023Updated 2 years ago
- Matrix Multiply-Accumulate with CUDA and WMMA( Tensor Core)☆144Aug 18, 2020Updated 5 years ago
- PipeInfer: Accelerating LLM Inference using Asynchronous Pipelined Speculation☆32Nov 16, 2024Updated last year
- Several optimization methods of half-precision general matrix vector multiplication (HGEMV) using CUDA core.☆73Sep 8, 2024Updated last year
- Implementation of IceFormer: Accelerated Inference with Long-Sequence Transformers on CPUs (ICLR 2024).☆25Feb 22, 2026Updated last week
- This is a repository to practice multi-thread programming in C++☆28Feb 21, 2024Updated 2 years ago
- ☆53Feb 24, 2026Updated last week
- All Resources from Stanford CS106B 2021☆24Jul 11, 2025Updated 7 months ago
- ☆97Mar 26, 2025Updated 11 months ago
- Algorithms of facial recognition through sketches☆13May 9, 2014Updated 11 years ago
- ☆10Aug 16, 2021Updated 4 years ago
- Official repository for the paper Local Linear Attention: An Optimal Interpolation of Linear and Softmax Attention For Test-Time Regressi…☆23Oct 1, 2025Updated 5 months ago
- amdgpu example code in hip/asm☆55Feb 28, 2026Updated last week
- ☆49Apr 15, 2024Updated last year
- This project is based on the [LTX-Video](https://github.com/Lightricks/LTX-Video) algorithm of the diffusers and optimized and accelerate…☆13Dec 31, 2024Updated last year
- ☆10Dec 31, 2018Updated 7 years ago
- 使用ONNXRuntime部署anchor-free系列的YOLOR,包含C++和Python两种版本的程序☆41Sep 18, 2021Updated 4 years ago
- [ACL 2024] RelayAttention for Efficient Large Language Model Serving with Long System Prompts☆40Feb 29, 2024Updated 2 years ago
- 机器学习编译 陈天奇☆54Jan 1, 2023Updated 3 years ago
- ☆54Mar 15, 2025Updated 11 months ago
- ☆104Sep 9, 2024Updated last year
- A standalone GEMM kernel for fp16 activation and quantized weight, extracted from FasterTransformer☆96Feb 20, 2026Updated 2 weeks ago
- [ICML 2025] SliM-LLM: Salience-Driven Mixed-Precision Quantization for Large Language Models☆53Aug 9, 2024Updated last year
- 使用 cutlass 实现 flash-attention 精简版,具有教学意义☆58Aug 12, 2024Updated last year
- GEMM☆10Aug 26, 2023Updated 2 years ago
- [ICML 2025] Efficiently Serving Large Multimodal Models Using EPD Disaggregation☆22May 29, 2025Updated 9 months ago
- KWANT is an open source C++ toolkit for computing scores and other metrics for object tracking systems.☆11Jan 22, 2026Updated last month
- Created a simple neural network using C++17 standard and the Eigen library that supports both forward and backward propagation.☆10Jul 27, 2024Updated last year
- Async C++ TCP server based on coroutines☆10Oct 2, 2021Updated 4 years ago
- ☆10May 24, 2020Updated 5 years ago
- Implemetation of "Pixel-In-Pixel Net: Towards Efficient Facial Landmark Detection in the Wild"☆11Jul 6, 2023Updated 2 years ago
- Base on emapgo(易图通) HDmap services, getting map message to build decision order on ROS system.☆10Sep 24, 2020Updated 5 years ago
- Multi-heap-sort for many small arrays, quicksort with 3 pivots for one big array, CUDA acceleration, CUDA memory compression.☆13Sep 29, 2024Updated last year