PFCCLab / StarterLinks
【HACKATHON 预备营】飞桨启航计划集训营
☆16Updated last week
Alternatives and similar repositories for Starter
Users that are interested in Starter are comparing it to the libraries listed below
Sorting:
- 飞桨护航计划集训营☆18Updated 2 weeks ago
- 本项目为飞桨框架学习群的活动☆25Updated 2 years ago
- Triton Documentation in Chinese Simplified / Triton 中文文档☆71Updated last month
- PaddlePaddle Developer Community☆111Updated this week
- PFCC 社区博客☆11Updated this week
- 分层解耦的深度学习推理引擎☆73Updated 3 months ago
- 使用 CUDA C++ 实现的 llama 模型推理框架☆57Updated 6 months ago
- ☆7Updated 7 months ago
- 📚FFPA(Split-D): Extend FlashAttention with Split-D for large headdim, O(1) GPU SRAM complexity, 1.8x~3x↑🎉 faster than SDPA EA.☆184Updated 3 weeks ago
- A light llama-like llm inference framework based on the triton kernel.☆122Updated this week
- PaddlePaddle Code Convert Toolkit. 『飞桨』深度学习代码转换工具☆102Updated last week
- Just a template for quickly creating a python library.☆8Updated last month
- PaddlePaddle custom device implementaion. (『飞桨』自定义硬件接入实现)☆84Updated this week
- ☆238Updated 3 months ago
- ☆25Updated 2 months ago
- Tutorials for writing high-performance GPU operators in AI frameworks.☆130Updated last year
- llm theoretical performance analysis tools and support params, flops, memory and latency analysis.☆92Updated last week
- Course materials for MIT6.5940: TinyML and Efficient Deep Learning Computing☆46Updated 4 months ago
- ☆63Updated this week
- b站上的课程☆75Updated last year
- my cs notes☆50Updated 7 months ago
- ☆85Updated 2 months ago
- A practical way of learning Swizzle☆19Updated 4 months ago
- Decoding Attention is specially optimized for MHA, MQA, GQA and MLA using CUDA core for the decoding stage of LLM inference.☆36Updated 2 months ago
- ☆131Updated last month
- Code release for book "Efficient Training in PyTorch"☆66Updated last month
- ⚡️Write HGEMM from scratch using Tensor Cores with WMMA, MMA and CuTe API, Achieve Peak⚡️ Performance.☆79Updated 3 weeks ago
- Parallel Prefix Sum (Scan) with CUDA☆21Updated 11 months ago
- A minimalist and extensible PyTorch extension for implementing custom backend operators in PyTorch.☆33Updated last year
- Summary of the Specs of Commonly Used GPUs for Training and Inference of LLM☆42Updated 2 months ago