A high-throughput and memory-efficient inference and serving engine for LLMs
☆96May 15, 2026Updated last week
Alternatives and similar repositories for vllm-musa
Users that are interested in vllm-musa are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- torch_musa is an open source repository based on PyTorch, which can make full use of the super computing power of MooreThreads graphics c…☆493Mar 17, 2026Updated 2 months ago
- DeepSeek-V3/R1 inference performance simulator☆194Mar 27, 2025Updated last year
- Compare different hardware platforms via the Roofline Model for LLM inference tasks.☆120Mar 13, 2024Updated 2 years ago
- a static analytical model for LLM distributed training☆133May 11, 2026Updated last week
- Solutions to AoC 2022 in zig☆12May 6, 2023Updated 3 years ago
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- ☆157Mar 4, 2025Updated last year
- YOLO-World-ONNX is a Python package for running inference on YOLO-WORLD Open-vocabulary-object detection model using ONNX models. It prov…☆17Feb 6, 2026Updated 3 months ago
- ncnn is a high-performance neural network inference framework optimized for the mobile platform☆14May 20, 2022Updated 4 years ago
- 2019CCF-BDCI大赛 OCR赛题第一名 天晨破晓团队 去水印网络CGAN模型baseline☆13Dec 31, 2019Updated 6 years ago
- ☆32Aug 24, 2022Updated 3 years ago
- Several optimization methods of half-precision general matrix vector multiplication (HGEMV) using CUDA core.☆74Sep 8, 2024Updated last year
- ☆10Apr 24, 2023Updated 3 years ago
- GPU implementation of Winograd convolution☆10Oct 23, 2017Updated 8 years ago
- Development repository for the Triton-Linalg conversion☆219Feb 7, 2025Updated last year
- Virtual machines for every use case on DigitalOcean • AdGet dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
- An IR for efficiently simulating distributed ML computation.☆33Jan 13, 2024Updated 2 years ago
- this is for intel openvino (https://software.intel.com/en-us/openvino-toolkit)☆18Oct 24, 2018Updated 7 years ago
- Official repository of paper "Parameters vs. Context: Fine-Grained Control of Knowledge Reliance in Language Models"☆26May 27, 2025Updated 11 months ago
- Fast GPU based tensor core reductions☆13Jan 13, 2023Updated 3 years ago
- Source code of the IPDPS '21 paper: "TileSpMV: A Tiled Algorithm for Sparse Matrix-Vector Multiplication on GPUs" by Yuyao Niu, Zhengyang…☆13Aug 12, 2022Updated 3 years ago
- This project is on how to Develop 1D Convolutional Neural Network Models for Human Activity Recognition Below is an example video of a s…☆12May 11, 2020Updated 6 years ago
- A unified programming framework for high and portable performance across FPGAs and GPUs☆11Mar 23, 2025Updated last year
- New batched algorithm for sparse matrix-matrix multiplication (SpMM)☆16May 7, 2019Updated 7 years ago
- a simple API to use CUPTI☆10Aug 19, 2025Updated 9 months ago
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- A intelligent matrix format designer for SpMV☆10Oct 10, 2023Updated 2 years ago
- ☆104May 11, 2026Updated last week
- Mathematical expression evaluator with just in time code generation.☆12Apr 7, 2013Updated 13 years ago
- ☆20May 24, 2025Updated 11 months ago
- DiscreteTom's Blog Boilerplate.☆10Mar 6, 2023Updated 3 years ago
- Convolutional Neural Network of vgg19 model using Cuda to accelerate☆12Jun 11, 2018Updated 7 years ago
- Cheng-Hao Tu, Jia-Hong Lee, Yi-Ming Chan and Chu-Song Chen, "Pruning Depthwise Separable Convolutions for MobileNet Compression," Interna…☆16Jan 8, 2021Updated 5 years ago
- 一些采用opencv3图像处理库做的一些项目,有检测人脸位置、人脸特效、头顶加LOGO等☆11Oct 31, 2022Updated 3 years ago
- ☆25Jan 22, 2020Updated 6 years ago
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- ☆15Apr 28, 2023Updated 3 years ago
- llm deploy project based onnx.☆49Oct 9, 2024Updated last year
- ☆20Aug 26, 2021Updated 4 years ago
- Escoin: Efficient Sparse Convolutional Neural Network Inference on GPUs☆16Feb 28, 2019Updated 7 years ago
- Directed masked autoencoders☆15Mar 25, 2026Updated last month
- ☆12Mar 31, 2021Updated 5 years ago
- 高性能计算☆23Jan 5, 2020Updated 6 years ago