dlsyscourse / hw2Links
☆8Updated 8 months ago
Alternatives and similar repositories for hw2
Users that are interested in hw2 are comparing it to the libraries listed below
Sorting:
- 使用 CUDA C++ 实现的 llama 模型推理框架☆57Updated 7 months ago
- 飞桨护航计划集训营☆18Updated last month
- Triton Documentation in Chinese Simplified / Triton 中文文档☆71Updated 2 months ago
- Machine Learning Compiler Road Map☆43Updated last year
- ☆138Updated 2 months ago
- A light llama-like llm inference framework based on the triton kernel.☆130Updated 2 weeks ago
- Tutorials for writing high-performance GPU operators in AI frameworks.☆129Updated last year
- ☆38Updated last year
- A practical way of learning Swizzle☆20Updated 5 months ago
- 【HACKATHON 预备营】飞桨启航计划集训营☆16Updated last month
- ☆137Updated last year
- Free resource for the book AI Compiler Development Guide☆45Updated 2 years ago
- ☆30Updated last month
- some hpc project for learning☆22Updated 10 months ago
- Course materials for MIT6.5940: TinyML and Efficient Deep Learning Computing☆48Updated 5 months ago
- llm theoretical performance analysis tools and support params, flops, memory and latency analysis.☆96Updated 2 weeks ago
- ☆87Updated 3 months ago
- Decoding Attention is specially optimized for MHA, MQA, GQA and MLA using CUDA core for the decoding stage of LLM inference.☆38Updated 3 weeks ago
- Code release for book "Efficient Training in PyTorch"☆69Updated 2 months ago
- A PyTorch-like deep learning framework. Just for fun.☆157Updated last year
- [USENIX ATC '24] Accelerating the Training of Large Language Models using Efficient Activation Rematerialization and Optimal Hybrid Paral…☆58Updated 11 months ago
- ☆70Updated 2 years ago
- Implement Flash Attention using Cute.☆87Updated 6 months ago