modelscope / dash-infer
DashInfer is a native LLM inference engine aiming to deliver industry-leading performance atop various hardware architectures, including x86 and ARMv9.
☆137Updated 2 months ago
Related projects ⓘ
Alternatives and complementary repositories for dash-infer
- ☆124Updated 2 weeks ago
- RTP-LLM: Alibaba's high-performance LLM inference engine for diverse applications.☆547Updated last month
- LLM Inference benchmark☆350Updated 4 months ago
- ☆290Updated last week
- A high-throughput and memory-efficient inference and serving engine for LLMs☆124Updated 11 months ago
- Transformer related optimization, including BERT, GPT☆39Updated last year
- llm-export can export llm model to onnx.☆231Updated last week
- Inferflow is an efficient and highly configurable inference engine for large language models (LLMs).☆236Updated 8 months ago
- ☆145Updated this week
- ☆100Updated 7 months ago
- C++ implementation of Qwen-LM☆554Updated 10 months ago
- Compare different hardware platforms via the Roofline Model for LLM inference tasks.☆76Updated 8 months ago
- A MoE impl for PyTorch, [ATC'23] SmartMoE☆57Updated last year
- run ChatGLM2-6B in BM1684X☆48Updated 8 months ago
- A streamlined and customizable framework for efficient large model evaluation and performance benchmarking☆262Updated this week
- Mixture-of-Experts (MoE) Language Model☆180Updated 2 months ago
- FlagScale is a large model toolkit based on open-sourced projects.☆178Updated this week
- export llama to onnx☆98Updated 5 months ago
- Efficient AI Inference & Serving☆458Updated 10 months ago
- ☆74Updated 11 months ago
- ☆291Updated 4 months ago
- ☆140Updated 7 months ago
- 支持中文场景的的小语言模型 llama2.c-zh☆146Updated 8 months ago
- Imitate OpenAI with Local Models☆85Updated 2 months ago
- A flexible and efficient training framework for large-scale alignment tasks☆211Updated this week
- [EMNLP 2024 Industry Track] This is the official PyTorch implementation of "LLMC: Benchmarking Large Language Model Quantization with a V…☆324Updated this week
- LLM101n: Let's build a Storyteller 中文版☆119Updated 3 months ago
- Transformer related optimization, including BERT, GPT☆17Updated last year
- llama inference for tencentpretrain☆96Updated last year
- ☆381Updated last week