pprp / ultrascale-playbook-zhLinks
UltraScale Playbook 中文版
☆98Updated 9 months ago
Alternatives and similar repositories for ultrascale-playbook-zh
Users that are interested in ultrascale-playbook-zh are comparing it to the libraries listed below
Sorting:
- 注释的nano_vllm仓库,并且完成了MiniCPM4的适配以及注册新模型的功能☆114Updated 4 months ago
- llm theoretical performance analysis tools and support params, flops, memory and latency analysis.☆113Updated 5 months ago
- Code release for book "Efficient Training in PyTorch"☆114Updated 8 months ago
- Train speculative decoding models effortlessly and port them smoothly to SGLang serving.☆539Updated this week
- 青稞Talk☆173Updated last week
- LLM Inference with Deep Learning Accelerator.☆56Updated 10 months ago
- learning how CUDA works☆350Updated 9 months ago
- ☆153Updated 9 months ago
- Compare different hardware platforms via the Roofline Model for LLM inference tasks.☆119Updated last year
- ☆105Updated 2 months ago
- ☆515Updated 3 weeks ago
- 🤖FFPA: Extend FlashAttention-2 with Split-D, ~O(1) SRAM complexity for large headdim, 1.8x~3x↑🎉 vs SDPA EA.☆235Updated 3 weeks ago
- LLM training technologies developed by kwai☆66Updated 2 weeks ago
- Triton Documentation in Chinese Simplified / Triton 中文文档☆95Updated 3 weeks ago
- ☆66Updated last week
- how to learn PyTorch and OneFlow☆461Updated last year
- Tutorials for writing high-performance GPU operators in AI frameworks.☆133Updated 2 years ago
- Analyze the inference of Large Language Models (LLMs). Analyze aspects like computation, storage, transmission, and hardware roofline mod…☆594Updated last year
- A light llama-like llm inference framework based on the triton kernel.☆166Updated 2 months ago
- ☆150Updated 5 months ago
- DashInfer is a native LLM inference engine aiming to deliver industry-leading performance atop various hardware architectures, including …☆270Updated 4 months ago
- Examples of CUDA implementations by Cutlass CuTe☆258Updated 5 months ago
- This repository organizes materials, recordings, and schedules related to AI-infra learning meetings.☆262Updated 2 weeks ago
- A collection of memory efficient attention operators implemented in the Triton language.☆286Updated last year
- ☆439Updated 4 months ago
- Efficient Mixture of Experts for LLM Paper List☆147Updated 2 months ago
- Summary of some awesome work for optimizing LLM inference☆146Updated 2 weeks ago
- Materials for learning SGLang☆682Updated 2 weeks ago
- ☆44Updated last year
- [EMNLP 2024 & AAAI 2026] A powerful toolkit for compressing large models including LLM, VLM, and video generation models.☆640Updated 3 weeks ago