A high-throughput and memory-efficient inference and serving engine for LLMs
☆100Jun 5, 2026Updated last week
Alternatives and similar repositories for vllm-musa
Users that are interested in vllm-musa are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Get up and running with OpenAI gpt-oss, DeepSeek-R1, Gemma 3 and other models on MTGPU.☆35Oct 13, 2025Updated 8 months ago
- torch_musa is an open source repository based on PyTorch, which can make full use of the super computing power of MooreThreads graphics c…☆497Mar 17, 2026Updated 2 months ago
- An adapter layer that ensures torch_musa🔦 delivers a CUDA-compatible PyTorch experience.☆36Updated this week
- DeepSeek-V3/R1 inference performance simulator☆196Mar 27, 2025Updated last year
- a static analytical model for LLM distributed training☆155May 11, 2026Updated last month
- Simple, predictable pricing with DigitalOcean hosting • AdAlways know what you'll pay with monthly caps and flat pricing. Enterprise-grade infrastructure trusted by 600k+ customers.
- Solutions to AoC 2022 in zig☆12May 6, 2023Updated 3 years ago
- ☆157Mar 4, 2025Updated last year
- pytorch code examples for measuring the performance of collective communication calls in AI workloads☆21Sep 18, 2025Updated 8 months ago
- A Winograd Minimal Filter Implementation in CUDA☆30Aug 25, 2021Updated 4 years ago
- YOLO-World-ONNX is a Python package for running inference on YOLO-WORLD Open-vocabulary-object detection model using ONNX models. It prov…☆17Feb 6, 2026Updated 4 months ago
- ncnn is a high-performance neural network inference framework optimized for the mobile platform☆14May 20, 2022Updated 4 years ago
- 2019CCF-BDCI大赛 OCR赛题第一名 天晨破晓团队 去水印网络CGAN模型baseline☆13Dec 31, 2019Updated 6 years ago
- Several optimization methods of half-precision general matrix vector multiplication (HGEMV) using CUDA core.☆74Sep 8, 2024Updated last year
- Aioli: A unified optimization framework for language model data mixing☆32Jan 17, 2025Updated last year
- Proton VPN Special Offer - Get 70% off • AdSpecial partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
- ☆10Apr 24, 2023Updated 3 years ago
- GPU implementation of Winograd convolution☆10Oct 23, 2017Updated 8 years ago
- Development repository for the Triton-Linalg conversion☆219Feb 7, 2025Updated last year
- VastModelZOO 是瀚博半导体VastAI-AIS团队维护的AI模型库,提供了人工智能多个领域(CV、AUDIO、NLP、LLM、MLLM等)的开源模型在瀚博训推芯片上的部署、训练示例。☆29Updated this week
- Fast GPU based tensor core reductions☆13Jan 13, 2023Updated 3 years ago
- ☆27Mar 17, 2025Updated last year
- This project is on how to Develop 1D Convolutional Neural Network Models for Human Activity Recognition Below is an example video of a s…☆12May 11, 2020Updated 6 years ago
- A unified programming framework for high and portable performance across FPGAs and GPUs☆11Mar 23, 2025Updated last year
- PydanticAI开源框架,搭建基于PostgreSQL、MySQL的Text2SQL应用进行SQL语句生成,支持GPT大模型、国产大模型、开源本地大模型☆16Dec 26, 2024Updated last year
- AI Agents on DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- a simple API to use CUPTI☆10Aug 19, 2025Updated 9 months ago
- ☆15Mar 30, 2024Updated 2 years ago
- A intelligent matrix format designer for SpMV☆10Oct 10, 2023Updated 2 years ago
- ☆107May 11, 2026Updated last month
- Mathematical expression evaluator with just in time code generation.☆12Apr 7, 2013Updated 13 years ago
- ☆13Jun 23, 2022Updated 3 years ago
- DiscreteTom's Blog Boilerplate.☆10Mar 6, 2023Updated 3 years ago
- Convolutional Neural Network of vgg19 model using Cuda to accelerate☆12Jun 11, 2018Updated 8 years ago
- Distributed SDDMM Kernel☆12Jul 8, 2022Updated 3 years ago
- Open source password manager - Proton Pass • AdSecurely store, share, and autofill your credentials with Proton Pass, the end-to-end encrypted password manager trusted by millions.
- ☆18Mar 1, 2025Updated last year
- ☆15Apr 28, 2023Updated 3 years ago
- llm deploy project based onnx.☆49Oct 9, 2024Updated last year
- ☆20Aug 26, 2021Updated 4 years ago
- ☆10May 27, 2025Updated last year
- Directed masked autoencoders☆15Mar 25, 2026Updated 2 months ago
- ☆12Mar 31, 2021Updated 5 years ago