DeepLink-org / AIChipBenchmarkLinks
☆30Updated 2 weeks ago
Alternatives and similar repositories for AIChipBenchmark
Users that are interested in AIChipBenchmark are comparing it to the libraries listed below
Sorting:
- ☆140Updated last year
- AI Accelerator Benchmark focuses on evaluating AI Accelerators from a practical production perspective, including the ease of use and ver…☆265Updated last month
- ☆150Updated 9 months ago
- ☆129Updated 9 months ago
- ☆91Updated last week
- ☆59Updated 10 months ago
- A prefill & decode disaggregated LLM serving framework with shared GPU memory and fine-grained compute isolation.☆111Updated 4 months ago
- DeepSparkHub selects hundreds of application algorithms and models, covering various fields of AI and general-purpose computing, to suppo…☆67Updated 2 weeks ago
- Compare different hardware platforms via the Roofline Model for LLM inference tasks.☆115Updated last year
- ☆75Updated 10 months ago
- ☆70Updated 11 months ago
- Tutorials for writing high-performance GPU operators in AI frameworks.☆132Updated 2 years ago
- A tutorial for CUDA&PyTorch☆155Updated 8 months ago
- An unofficial cuda assembler, for all generations of SASS, hopefully :)☆83Updated 2 years ago
- optimized BERT transformer inference on NVIDIA GPU. https://arxiv.org/abs/2210.03052☆478Updated last year
- ☆139Updated last year
- ☆109Updated 6 months ago
- 动手学习TVM核心原理教程☆63Updated 4 years ago
- llm theoretical performance analysis tools and support params, flops, memory and latency analysis.☆108Updated 3 months ago
- ☆37Updated 11 months ago
- GLake: optimizing GPU memory management and IO transmission.☆479Updated 6 months ago
- FlagGems is an operator library for large language models implemented in the Triton Language.☆691Updated this week
- heterogeneity-aware-lowering-and-optimization☆256Updated last year
- 🤖FFPA: Extend FlashAttention-2 with Split-D, ~O(1) SRAM complexity for large headdim, 1.8x~3x↑🎉 vs SDPA EA.☆220Updated 2 months ago
- Efficient operation implementation based on the Cambricon Machine Learning Unit (MLU) .☆134Updated last week
- FlagScale is a large model toolkit based on open-sourced projects.☆358Updated last week
- Transformer related optimization, including BERT, GPT☆17Updated 2 years ago
- Performance of the C++ interface of flash attention and flash attention v2 in large language model (LLM) inference scenarios.☆40Updated 7 months ago
- ☆99Updated last year
- NART = NART is not A RunTime, a deep learning inference framework.☆37Updated 2 years ago