DeepLink-org / AIChipBenchmarkLinks

☆30

Alternatives and similar repositories for AIChipBenchmark

Users that are interested in AIChipBenchmark are comparing it to the libraries listed below

Sorting:

OpenPPL / ppl.nn.llm
☆140Updated last year
bytedance / ByteMLPerf
AI Accelerator Benchmark focuses on evaluating AI Accelerators from a practical production perspective, including the ease of use and ver…
☆265Updated last month
OpenPPL / ppl.llm.kernel.cuda
☆150Updated 9 months ago
OpenPPL / ppl.llm.serving
☆129Updated 9 months ago
FlagOpen / FlagCX
☆91Updated last week
OpenPPL / ppl.pmx
☆59Updated 10 months ago
infinigence / Semi-PD
A prefill & decode disaggregated LLM serving framework with shared GPU memory and fine-grained compute isolation.
☆111Updated 4 months ago
Deep-Spark / DeepSparkHub
DeepSparkHub selects hundreds of application algorithms and models, covering various fields of AI and general-purpose computing, to suppo…
☆67Updated 2 weeks ago
feifeibear / LLMRoofline
Compare different hardware platforms via the Roofline Model for LLM inference tasks.
☆115Updated last year
DeepLink-org / DIOPI
☆75Updated 10 months ago
DeepLink-org / deeplink.framework
☆70Updated 11 months ago
openmlsys / openmlsys-cuda
Tutorials for writing high-performance GPU operators in AI frameworks.
☆132Updated 2 years ago
CalvinXKY / BasicCUDA
A tutorial for CUDA&PyTorch
☆155Updated 8 months ago
OpenPPL / CuAssembler
An unofficial cuda assembler, for all generations of SASS, hopefully ：）
☆83Updated 2 years ago
bytedance / ByteTransformer
optimized BERT transformer inference on NVIDIA GPU. https://arxiv.org/abs/2210.03052
☆478Updated last year
AyakaGEMM / Hands-on-GEMM
☆139Updated last year
MARD1NO / CUDA-PPT
☆109Updated 6 months ago
whitelok / tvm-lesson
动手学习TVM核心原理教程
☆63Updated 4 years ago
harleyszhang / llm_counts
llm theoretical performance analysis tools and support params, flops, memory and latency analysis.
☆108Updated 3 months ago
OpenPPL / ppl.kernel.cuda
☆37Updated 11 months ago
antgroup / glake
GLake: optimizing GPU memory management and IO transmission.
☆479Updated 6 months ago
FlagOpen / FlagGems
FlagGems is an operator library for large language models implemented in the Triton Language.
☆691Updated this week
alibaba / heterogeneity-aware-lowering-and-optimization
heterogeneity-aware-lowering-and-optimization
☆256Updated last year
xlite-dev / ffpa-attn
🤖FFPA: Extend FlashAttention-2 with Split-D, ~O(1) SRAM complexity for large headdim, 1.8x~3x↑🎉 vs SDPA EA.
☆220Updated 2 months ago
Cambricon / mlu-ops
Efficient operation implementation based on the Cambricon Machine Learning Unit (MLU) .
☆134Updated last week
FlagOpen / FlagScale
FlagScale is a large model toolkit based on open-sourced projects.
☆358Updated last week
Rayrtfr / FasterTransformer
Transformer related optimization, including BERT, GPT
☆17Updated 2 years ago
Bruce-Lee-LY / flash_attention_inference
Performance of the C++ interface of flash attention and flash attention v2 in large language model (LLM) inference scenarios.
☆40Updated 7 months ago
AlibabaPAI / FLASHNN
☆99Updated last year
ModelTC / NART
NART = NART is not A RunTime, a deep learning inference framework.
☆37Updated 2 years ago