DeepLink-org / AIChipBenchmarkLinks
☆33Updated last month
Alternatives and similar repositories for AIChipBenchmark
Users that are interested in AIChipBenchmark are comparing it to the libraries listed below
Sorting:
- ☆141Updated last year
- AI Accelerator Benchmark focuses on evaluating AI Accelerators from a practical production perspective, including the ease of use and ver…☆298Updated last week
- ☆130Updated last year
- ☆152Updated last year
- FlagCX is a scalable and adaptive cross-chip communication library.☆170Updated this week
- A prefill & decode disaggregated LLM serving framework with shared GPU memory and fine-grained compute isolation.☆123Updated last month
- ☆60Updated last year
- Compare different hardware platforms via the Roofline Model for LLM inference tasks.☆120Updated last year
- ☆119Updated 10 months ago
- ☆74Updated last week
- DeepSparkHub selects hundreds of application algorithms and models, covering various fields of AI and general-purpose computing, to suppo…☆70Updated this week
- An unofficial cuda assembler, for all generations of SASS, hopefully :)☆84Updated 2 years ago
- LLM Inference via Triton (Flexible & Modular): Focused on Kernel Optimization using CUBIN binaries, Starting from gpt-oss Model☆63Updated 3 months ago
- A standalone GEMM kernel for fp16 activation and quantized weight, extracted from FasterTransformer☆96Updated 4 months ago
- High Performance LLM Inference Operator Library☆603Updated this week
- play gemm with tvm☆92Updated 2 years ago
- FlagTree is a unified compiler supporting multiple AI chip backends for custom Deep Learning operations, which is forked from triton-lang…☆197Updated this week
- ☆98Updated 4 years ago
- ☆105Updated last year
- ☆76Updated last year
- Tutorials for writing high-performance GPU operators in AI frameworks.☆135Updated 2 years ago
- code reading for tvm☆76Updated 4 years ago
- ☆73Updated last year
- ☆38Updated last year
- Performance of the C++ interface of flash attention and flash attention v2 in large language model (LLM) inference scenarios.☆44Updated 11 months ago
- llm theoretical performance analysis tools and support params, flops, memory and latency analysis.☆114Updated 6 months ago
- ☆19Updated last year
- ☆15Updated 3 years ago
- SGLang kernel library for NPU☆96Updated this week
- 使用 CUDA C++ 实现的 llama 模型推理框架☆64Updated last year