A high-throughput and memory-efficient inference and serving engine for LLMs
☆94May 1, 2026Updated this week
Alternatives and similar repositories for vllm-musa
Users that are interested in vllm-musa are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Get up and running with OpenAI gpt-oss, DeepSeek-R1, Gemma 3 and other models on MTGPU.☆35Oct 13, 2025Updated 6 months ago
- Fork from https://github.com/deepseek-ai/FlashMLA☆16Feb 26, 2025Updated last year
- MUSA Templates for Linear Algebra Subroutines☆45Jan 30, 2026Updated 3 months ago
- 大语言模型工具集☆27Aug 1, 2025Updated 9 months ago
- torch_musa is an open source repository based on PyTorch, which can make full use of the super computing power of MooreThreads graphics c…☆492Mar 17, 2026Updated last month
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- RISCV C and Triton AI-Benchmark☆24Jan 28, 2026Updated 3 months ago
- An adapter layer that ensures torch_musa🔦 delivers a CUDA-compatible PyTorch experience.☆34Updated this week
- DeepSeek-V3/R1 inference performance simulator☆195Mar 27, 2025Updated last year
- Compare different hardware platforms via the Roofline Model for LLM inference tasks.☆119Mar 13, 2024Updated 2 years ago
- Solutions to AoC 2022 in zig☆12May 6, 2023Updated 2 years ago
- Offline optimization of your disaggregated Dynamo graph☆280Updated this week
- ncnn is a high-performance neural network inference framework optimized for the mobile platform☆14May 20, 2022Updated 3 years ago
- 打包工具parcel的中文文档和自己写的parcel使用demo☆15Mar 2, 2018Updated 8 years ago
- Several optimization methods of half-precision general matrix vector multiplication (HGEMV) using CUDA core.☆74Sep 8, 2024Updated last year
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- ☆10Apr 24, 2023Updated 3 years ago
- GPU implementation of Winograd convolution☆10Oct 23, 2017Updated 8 years ago
- An IR for efficiently simulating distributed ML computation.☆33Jan 13, 2024Updated 2 years ago
- this is for intel openvino (https://software.intel.com/en-us/openvino-toolkit)☆18Oct 24, 2018Updated 7 years ago
- ☆15May 2, 2018Updated 8 years ago
- Source code for cmake-practice.pdf☆11Jun 11, 2014Updated 11 years ago
- VastModelZOO 是瀚博半导体VastAI-AIS团队维护的AI模型库,提供了人工智能多个领域(CV、AUDIO、NLP、LLM、MLLM等)的开源模型在瀚博训推芯片上的部署、训练示例。☆29Apr 17, 2026Updated 2 weeks ago
- Official repository of paper "Parameters vs. Context: Fine-Grained Control of Knowledge Reliance in Language Models"☆25May 27, 2025Updated 11 months ago
- Fast GPU based tensor core reductions☆13Jan 13, 2023Updated 3 years ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- Source code of the IPDPS '21 paper: "TileSpMV: A Tiled Algorithm for Sparse Matrix-Vector Multiplication on GPUs" by Yuyao Niu, Zhengyang…☆13Aug 12, 2022Updated 3 years ago
- This project is on how to Develop 1D Convolutional Neural Network Models for Human Activity Recognition Below is an example video of a s…☆12May 11, 2020Updated 5 years ago
- A unified programming framework for high and portable performance across FPGAs and GPUs☆11Mar 23, 2025Updated last year
- New batched algorithm for sparse matrix-matrix multiplication (SpMM)☆16May 7, 2019Updated 6 years ago
- Agentman: A tool for building and managing AI agents☆17Jul 10, 2025Updated 9 months ago
- a simple API to use CUPTI☆10Aug 19, 2025Updated 8 months ago
- A intelligent matrix format designer for SpMV☆10Oct 10, 2023Updated 2 years ago
- ☆99Apr 24, 2026Updated last week
- Mathematical expression evaluator with just in time code generation.☆12Apr 7, 2013Updated 13 years ago
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- ☆20May 24, 2025Updated 11 months ago
- ☆13Jun 23, 2022Updated 3 years ago
- Cheng-Hao Tu, Jia-Hong Lee, Yi-Ming Chan and Chu-Song Chen, "Pruning Depthwise Separable Convolutions for MobileNet Compression," Interna…☆16Jan 8, 2021Updated 5 years ago
- Distributed SDDMM Kernel☆12Jul 8, 2022Updated 3 years ago
- 一些采用opencv3图像处理库做的一些项目,有检测人脸位置、人脸特效、头顶加LOGO等☆11Oct 31, 2022Updated 3 years ago
- ☆18Mar 1, 2025Updated last year
- ☆20Aug 26, 2021Updated 4 years ago