DeepLink-org / AIChipBenchmark
☆17Updated 2 weeks ago
Related projects: ⓘ
- ☆123Updated 3 months ago
- ☆133Updated 2 months ago
- ☆140Updated 4 months ago
- ☆56Updated last week
- FlagGems is an operator library for large language models implemented in Triton Language.☆246Updated last week
- The DeepSpark open platform selects hundreds of open source application algorithms and models that are deeply coupled with industrial app…☆26Updated this week
- ☆90Updated 6 months ago
- AI Accelerator Benchmark focuses on evaluating AI Accelerators from a practical production perspective, including the ease of use and ver…☆188Updated 3 weeks ago
- DeepSparkHub selects hundreds of application algorithms and models, covering various fields of AI and general-purpose computing, to suppo…☆46Updated last week
- This is the official PyTorch implementation of "LLMC: Benchmarking Large Language Model Quantization with a Versatile Compression Toolkit…☆227Updated this week
- Ascend PyTorch adapter (torch_npu). Mirror of https://gitee.com/ascend/pytorch☆226Updated last week
- ☆32Updated 3 months ago
- NART = NART is not A RunTime, a deep learning inference framework.☆38Updated last year
- Performance of the C++ interface of flash attention and flash attention v2 in large language model (LLM) inference scenarios.☆20Updated 2 weeks ago
- Efficient operation implementation based on the Cambricon Machine Learning Unit (MLU) .☆100Updated last week
- ☆68Updated 2 weeks ago
- Simple Dynamic Batching Inference☆145Updated 2 years ago
- ☆251Updated last week
- FlagScale is a large model toolkit based on open-sourced projects.☆129Updated last week
- PaddlePaddle custom device implementaion. (『飞桨』自定义硬件接入实现)☆67Updated last week
- ☆22Updated last year
- Disaggregated serving system for Large Language Models (LLMs).☆278Updated last month
- Compare different hardware platforms via the Roofline Model for LLM inference tasks.☆71Updated 6 months ago
- Transformer related optimization, including BERT, GPT☆58Updated last year
- llm-export can export llm model to onnx.☆193Updated this week
- code reading for tvm☆69Updated 2 years ago
- Tutorials for writing high-performance GPU operators in AI frameworks.☆118Updated last year
- ☆70Updated 9 months ago
- optimized BERT transformer inference on NVIDIA GPU. https://arxiv.org/abs/2210.03052