A high-throughput and memory-efficient inference and serving engine for LLMs
☆87Apr 9, 2026Updated this week
Alternatives and similar repositories for vllm-musa
Users that are interested in vllm-musa are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- torch_musa is an open source repository based on PyTorch, which can make full use of the super computing power of MooreThreads graphics c…☆485Mar 17, 2026Updated 3 weeks ago
- RISCV C and Triton AI-Benchmark☆23Jan 28, 2026Updated 2 months ago
- DeepSeek-V3/R1 inference performance simulator☆193Mar 27, 2025Updated last year
- ☆25Mar 15, 2023Updated 3 years ago
- Compare different hardware platforms via the Roofline Model for LLM inference tasks.☆118Mar 13, 2024Updated 2 years ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting with the flexibility to host WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Cloudways by DigitalOcean.
- Solutions to AoC 2022 in zig☆12May 6, 2023Updated 2 years ago
- ☆155Mar 4, 2025Updated last year
- A Winograd Minimal Filter Implementation in CUDA☆28Aug 25, 2021Updated 4 years ago
- Run code-llama with 50k tokens using flash attention and better transformer☆12Nov 21, 2023Updated 2 years ago
- ncnn is a high-performance neural network inference framework optimized for the mobile platform☆14May 20, 2022Updated 3 years ago
- ☆32Aug 24, 2022Updated 3 years ago
- ☆10Apr 24, 2023Updated 2 years ago
- Aioli: A unified optimization framework for language model data mixing☆32Jan 17, 2025Updated last year
- Development repository for the Triton-Linalg conversion☆218Feb 7, 2025Updated last year
- NordVPN Special Discount Offer • AdSave on top-rated NordVPN 1 or 2-year plans with secure browsing, privacy protection, and support for for all major platforms.
- FFTW is a C subroutine library for computing the discrete Fourier transform (DFT) in one or more dimensions, of arbitrary input size, and…☆17Sep 6, 2018Updated 7 years ago
- An IR for efficiently simulating distributed ML computation.☆33Jan 13, 2024Updated 2 years ago
- this is for intel openvino (https://software.intel.com/en-us/openvino-toolkit)☆18Oct 24, 2018Updated 7 years ago
- ☆71Feb 4, 2026Updated 2 months ago
- ☆15May 2, 2018Updated 7 years ago
- Source code of the IPDPS '21 paper: "TileSpMV: A Tiled Algorithm for Sparse Matrix-Vector Multiplication on GPUs" by Yuyao Niu, Zhengyang…☆13Aug 12, 2022Updated 3 years ago
- Code for the paper "Match, Compare, or Select? An Investigation of Large Language Models for Entity Matching" (COLING 2025)☆19Jan 3, 2026Updated 3 months ago
- New batched algorithm for sparse matrix-matrix multiplication (SpMM)☆16May 7, 2019Updated 6 years ago
- A intelligent matrix format designer for SpMV☆10Oct 10, 2023Updated 2 years ago
- DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- Mathematical expression evaluator with just in time code generation.☆12Apr 7, 2013Updated 13 years ago
- ☆20May 24, 2025Updated 10 months ago
- LoRA supervised fine-tuning, RLHF (PPO) and RAG with llama-3-8B on the TLDR summarization dataset☆14Feb 2, 2025Updated last year
- ☆13Jun 23, 2022Updated 3 years ago
- DiscreteTom's Blog Boilerplate.☆10Mar 6, 2023Updated 3 years ago
- Convolutional Neural Network of vgg19 model using Cuda to accelerate☆12Jun 11, 2018Updated 7 years ago
- Cheng-Hao Tu, Jia-Hong Lee, Yi-Ming Chan and Chu-Song Chen, "Pruning Depthwise Separable Convolutions for MobileNet Compression," Interna…☆16Jan 8, 2021Updated 5 years ago
- Distributed SDDMM Kernel☆12Jul 8, 2022Updated 3 years ago
- 一些采用opencv3图像处理库做的一些项目,有检测人脸位置、人脸特效、头顶加LOGO等☆11Oct 31, 2022Updated 3 years ago
- DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- ☆15Apr 28, 2023Updated 2 years ago
- ☆20Aug 26, 2021Updated 4 years ago
- ☆25Jan 22, 2020Updated 6 years ago
- llm deploy project based onnx.☆50Oct 9, 2024Updated last year
- Escoin: Efficient Sparse Convolutional Neural Network Inference on GPUs☆16Feb 28, 2019Updated 7 years ago
- ☆12Mar 31, 2021Updated 5 years ago
- 高性能计算☆22Jan 5, 2020Updated 6 years ago