dlsyscourse / hw2
☆7Updated 6 months ago
Alternatives and similar repositories for hw2:
Users that are interested in hw2 are comparing it to the libraries listed below
- Triton Documentation in Chinese Simplified / Triton 中文文档☆67Updated last week
- 使用 CUDA C++ 实现的 llama 模型推理框架☆50Updated 5 months ago
- A practical way of learning Swizzle☆18Updated 2 months ago
- ☆121Updated this week
- 飞桨护航计划集训营☆18Updated this week
- Tutorials for writing high-performance GPU operators in AI frameworks.☆130Updated last year
- ☆123Updated last year
- Transformer related optimization, including BERT, GPT☆17Updated last year
- ☆131Updated last month
- ☆16Updated last year
- Summary of system papers/frameworks/codes/tools on training or serving large model☆56Updated last year
- A light llama-like llm inference framework based on the triton kernel.☆108Updated this week
- A PyTorch-like deep learning framework. Just for fun.☆154Updated last year
- Decoding Attention is specially optimized for MHA, MQA, GQA and MLA using CUDA core for the decoding stage of LLM inference.☆36Updated 3 weeks ago
- A simple calculation for LLM MFU.☆36Updated last month
- 【HACKATHON 预备营】飞桨启航计划集训营☆16Updated this week
- A minimalist and extensible PyTorch extension for implementing custom backend operators in PyTorch.☆33Updated last year
- Inference code for LLaMA models☆120Updated last year
- Machine Learning Compiler Road Map☆43Updated last year
- b站上的课程☆74Updated last year
- 分层解耦的深度学习推理引擎☆72Updated 2 months ago
- ☆78Updated last year
- My solutions to the assignments of CMU 10-714 Deep Learning Systems 2022☆36Updated last year
- PyTorch bindings for CUTLASS grouped GEMM.☆120Updated 3 months ago
- ☆83Updated last month
- [USENIX ATC '24] Accelerating the Training of Large Language Models using Efficient Activation Rematerialization and Optimal Hybrid Paral…☆53Updated 8 months ago
- Code base and slides for ECE408:Applied Parallel Programming On GPU.☆122Updated 3 years ago
- Compare different hardware platforms via the Roofline Model for LLM inference tasks.☆98Updated last year
- Multiple GEMM operators are constructed with cutlass to support LLM inference.☆17Updated 7 months ago
- GPTQ inference TVM kernel☆38Updated last year