omni-ai-npu / omni-inferLinks
Omni_Infer is a suite of inference accelerators designed for the Ascend NPU platform, offering native support and an expanding feature set.
☆48Updated this week
Alternatives and similar repositories for omni-infer
Users that are interested in omni-infer are comparing it to the libraries listed below
Sorting:
- DashInfer is a native LLM inference engine aiming to deliver industry-leading performance atop various hardware architectures, including …☆263Updated last week
- llm-inference is a platform for publishing and managing llm inference, providing a wide range of out-of-the-box features for model deploy…☆85Updated last year
- Ling is a MoE LLM provided and open-sourced by InclusionAI.☆181Updated 2 months ago
- ☆172Updated this week
- Mixture-of-Experts (MoE) Language Model☆189Updated 10 months ago
- Skywork-MoE: A Deep Dive into Training Techniques for Mixture-of-Experts Language Models☆136Updated last year
- FlagScale is a large model toolkit based on open-sourced projects.☆333Updated this week
- LLaMA Factory Document☆146Updated 2 weeks ago
- Inferflow is an efficient and highly configurable inference engine for large language models (LLMs).☆246Updated last year
- Efficient AI Inference & Serving☆472Updated last year
- Accelerate inference without tears☆322Updated 4 months ago
- A MoE impl for PyTorch, [ATC'23] SmartMoE☆66Updated 2 years ago
- ☆77Updated 4 months ago
- CPM.cu is a lightweight, high-performance CUDA implementation for LLMs, optimized for end-device inference and featuring cutting-edge tec…☆169Updated this week
- Delta-CoMe can achieve near loss-less 1-bit compressin which has been accepted by NeurIPS 2024☆56Updated 8 months ago
- ☆45Updated last year
- ☆474Updated last week
- A flexible and efficient training framework for large-scale alignment tasks☆400Updated this week
- GLM Series Edge Models☆146Updated last month
- This is a user guide for the MiniCPM and MiniCPM-V series of small language models (SLMs) developed by ModelBest. “面壁小钢炮” focuses on achi…☆268Updated last month
- ☆326Updated this week
- LLM Inference benchmark☆424Updated last year
- 一种任务级GPU算力分时调度的高性能深度学习训练平台☆675Updated last year
- ☆111Updated last year
- ☆30Updated 11 months ago
- 配合 HAI Platform 使用的集成化用户界面☆52Updated 2 years ago
- Efficient, Flexible, and Highly Fault-Tolerant Model Service Management Based on SGLang☆55Updated 8 months ago
- ☆79Updated last year
- Pretrain, finetune and serve LLMs on Intel platforms with Ray☆128Updated 3 weeks ago
- a toolkit on knowledge distillation for large language models☆134Updated last week