MooreThreads / vllm_musaLinks
A high-throughput and memory-efficient inference and serving engine for LLMs
☆64Updated 11 months ago
Alternatives and similar repositories for vllm_musa
Users that are interested in vllm_musa are comparing it to the libraries listed below
Sorting:
- DashInfer is a native LLM inference engine aiming to deliver industry-leading performance atop various hardware architectures, including …☆265Updated 2 months ago
- Run generative AI models in sophgo BM1684X/BM1688☆248Updated last week
- ☆129Updated 9 months ago
- ☆50Updated 11 months ago
- llm-export can export llm model to onnx.☆313Updated last month
- ☆140Updated last year
- ☆63Updated 3 weeks ago
- run ChatGLM2-6B in BM1684X☆49Updated last year
- PaddlePaddle custom device implementaion. (『飞桨』自定义硬件接入实现)☆96Updated last week
- FlagScale is a large model toolkit based on open-sourced projects.☆358Updated last week
- ☆70Updated 11 months ago
- Ascend PyTorch adapter (torch_npu). Mirror of https://gitee.com/ascend/pytorch☆437Updated 3 weeks ago
- Triton Documentation in Chinese Simplified / Triton 中文文档☆85Updated 5 months ago
- ☆59Updated 10 months ago
- torch_musa is an open source repository based on PyTorch, which can make full use of the super computing power of MooreThreads graphics c…☆435Updated 3 weeks ago
- Compare different hardware platforms via the Roofline Model for LLM inference tasks.☆115Updated last year
- ☆430Updated 3 weeks ago
- ☆174Updated this week
- ☆30Updated 2 weeks ago
- LLM101n: Let's build a Storyteller 中文版☆132Updated last year
- export llama to onnx☆136Updated 9 months ago
- DeepSparkHub selects hundreds of application algorithms and models, covering various fields of AI and general-purpose computing, to suppo…☆67Updated 2 weeks ago
- LLM 推理服务性能测试☆43Updated last year
- ☆503Updated 3 weeks ago
- ☆75Updated 10 months ago
- RTP-LLM: Alibaba's high-performance LLM inference engine for diverse applications.☆874Updated last week
- CPM.cu is a lightweight, high-performance CUDA implementation for LLMs, optimized for end-device inference and featuring cutting-edge tec…☆197Updated 3 weeks ago
- A powerful toolkit for compressing large models including LLM, VLM, and video generation models.☆579Updated last month
- ☆150Updated 9 months ago
- Accelerate LLM with low-bit (FP4 / INT4 / FP8 / INT8) optimizations using ipex-llm☆168Updated 5 months ago