☆21Mar 22, 2021Updated 5 years ago
Alternatives and similar repositories for SGEMM-SASS-Annotation
Users that are interested in SGEMM-SASS-Annotation are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- ☆18Nov 22, 2025Updated 4 months ago
- Manages vllm-nccl dependency☆17Jun 3, 2024Updated last year
- standalone boost preprocessor library☆13Apr 16, 2013Updated 13 years ago
- A simple high performance CUDA GEMM implementation.☆430Jan 4, 2024Updated 2 years ago
- ☆10Nov 29, 2022Updated 3 years ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- amdgpu example code in hip/asm☆58Mar 18, 2026Updated 3 weeks ago
- Some source code about matrix multiplication implementation on CUDA☆34Sep 12, 2018Updated 7 years ago
- [EMNLP 2023] Official implementation of the algorithm ETSC: Exact Toeplitz-to-SSM Conversion our EMNLP 2023 paper - Accelerating Toeplitz…☆14Oct 17, 2023Updated 2 years ago
- ☆13Jan 7, 2025Updated last year
- NVIDIA TensorRT Hackathon 2023复赛选题:通义千问Qwen-7B用TensorRT-LLM模型搭建及优化☆43Oct 20, 2023Updated 2 years ago
- ☆12Jan 19, 2020Updated 6 years ago
- Whisper in TensorRT-LLM☆17Sep 21, 2023Updated 2 years ago
- Implement asm gemm on vega64 for 4096x4096 fp32 matrix☆22Oct 12, 2019Updated 6 years ago
- A practical way of learning Swizzle☆37Feb 3, 2025Updated last year
- Wordpress hosting with auto-scaling - Free Trial • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- using pvanet framework train mobilenet-v2 for objects detection, papaer: https://arxiv.org/abs/1611.08588☆13Feb 13, 2019Updated 7 years ago
- [SIGGRAPH ASIA 2024] Frankenstein: Generating Semantic-Compositional 3D Scenes in One Tri-Plane☆20Nov 25, 2024Updated last year
- This is a repository to practice multi-thread programming in C++☆28Feb 21, 2024Updated 2 years ago
- Matrix Multiply-Accumulate with CUDA and WMMA( Tensor Core)☆145Aug 18, 2020Updated 5 years ago
- ☆62Apr 3, 2026Updated last week
- [TOG 2024] BlockFusion: Expandable 3D Scene Generation using Latent Tri-plane Extrapolation☆16Jun 14, 2024Updated last year
- ☆16Mar 23, 2023Updated 3 years ago
- ☆49Apr 15, 2024Updated 2 years ago
- 记录关于AEC的论文和代码、博客以及相关资料☆15Jul 26, 2022Updated 3 years ago
- Wordpress hosting with auto-scaling - Free Trial • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- ☆20Apr 17, 2023Updated 2 years ago
- KWANT is an open source C++ toolkit for computing scores and other metrics for object tracking systems.☆11Jan 22, 2026Updated 2 months ago
- ☆10Dec 31, 2018Updated 7 years ago
- [TPAMI 2023] This is an official implementation for "Vicinity Vision Transformer".☆22Jun 15, 2023Updated 2 years ago
- This repository provides tutorial, which discusses running sample publisher and subscriber using multiple transports of point_cloud_trans…☆11Mar 17, 2026Updated 3 weeks ago
- This repository contains my implementation of a shape-constrained network which predicts up to 170 FPS☆12Feb 12, 2019Updated 7 years ago
- Row-wise block scaling for fp8 quantization matrix multiplication. Solution to GPU mode AMD challenge.☆19Feb 9, 2026Updated 2 months ago
- The implementation of Paper “Multiparameter modeling with ANN for antenna design”.☆14Apr 13, 2020Updated 6 years ago
- CenterNet3D 部署版本,便于移植不同平台(onnx、tensorRT、rknn、Horizon)。☆13May 24, 2024Updated last year
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- This repo contains all the code, slides and other reference documents used in community sessions.☆14Mar 29, 2023Updated 3 years ago
- ☆113Apr 19, 2024Updated last year
- 机器学习编译 陈天奇☆56Jan 1, 2023Updated 3 years ago
- ☆15Dec 1, 2023Updated 2 years ago
- ☆14Jun 4, 2024Updated last year
- Several optimization methods of half-precision general matrix vector multiplication (HGEMV) using CUDA core.☆74Sep 8, 2024Updated last year
- PipeInfer: Accelerating LLM Inference using Asynchronous Pipelined Speculation☆32Nov 16, 2024Updated last year