dlsyscourse / hw2
☆6Updated 5 months ago
Alternatives and similar repositories for hw2:
Users that are interested in hw2 are comparing it to the libraries listed below
- 使用 CUDA C++ 实现的 llama 模型推理框架☆48Updated 4 months ago
- ☆16Updated last year
- ☆108Updated this week
- Triton Documentation in Chinese Simplified / Triton 中文文档☆62Updated 2 months ago
- Transformer related optimization, including BERT, GPT☆17Updated last year
- 飞桨护航计划集训营☆19Updated last week
- A practical way of learning Swizzle☆16Updated 2 months ago
- Machine Learning Compiler Road Map☆43Updated last year
- A simple calculation for LLM MFU.☆29Updated last month
- Decoding Attention is specially optimized for MHA, MQA, GQA and MLA using CUDA core for the decoding stage of LLM inference.☆35Updated 3 weeks ago
- ☆115Updated last year
- Multiple GEMM operators are constructed with cutlass to support LLM inference.☆17Updated 6 months ago
- GPTQ inference TVM kernel☆38Updated 11 months ago
- ☆78Updated last year
- b站上的课程☆72Updated last year
- A minimalist and extensible PyTorch extension for implementing custom backend operators in PyTorch.☆34Updated 11 months ago
- llm theoretical performance analysis tools and support params, flops, memory and latency analysis.☆81Updated 2 months ago
- My solutions to the assignments of CMU 10-714 Deep Learning Systems 2022☆36Updated last year
- Tutorials for writing high-performance GPU operators in AI frameworks.☆130Updated last year
- 分层解耦的深度学习推理引擎☆72Updated last month
- [USENIX ATC '24] Accelerating the Training of Large Language Models using Efficient Activation Rematerialization and Optimal Hybrid Paral…☆51Updated 8 months ago
- Implement Flash Attention using Cute.☆74Updated 3 months ago
- ☆76Updated last week
- Performance of the C++ interface of flash attention and flash attention v2 in large language model (LLM) inference scenarios.☆35Updated last month
- This is a cross-chip platform collection of operators and a unified neural network library.☆16Updated last year
- A light llama-like llm inference framework based on the triton kernel.☆103Updated 3 weeks ago
- ☆70Updated 2 years ago
- Transformer related optimization, including BERT, GPT☆59Updated last year
- ☆50Updated 2 months ago
- ☆65Updated 3 months ago