FelixFu520 / README
A pupil in the computer world.(Felix Fu)
☆172Updated 3 months ago
Related projects: ⓘ
- A tutorial for CUDA&PyTorch☆110Updated last week
- Tutorials for writing high-performance GPU operators in AI frameworks.☆118Updated last year
- ☆90Updated 6 months ago
- learning how CUDA works☆150Updated last month
- how to learn PyTorch and OneFlow☆329Updated 5 months ago
- flash attention tutorial written in python, triton, cuda, cutlass☆159Updated 3 months ago
- Disaggregated serving system for Large Language Models (LLMs).☆278Updated last month
- Compare different hardware platforms via the Roofline Model for LLM inference tasks.☆71Updated 6 months ago
- A self-learning tutorail for CUDA High Performance Programing.☆119Updated 2 months ago
- Ascend PyTorch adapter (torch_npu). Mirror of https://gitee.com/ascend/pytorch☆226Updated last week
- Since the emergence of chatGPT in 2022, the acceleration of Large Language Model has become increasingly important. Here is a list of pap…☆153Updated last week
- An awesome gpu tasks scheduler. 轻量好用的GPU机群任务调度工具。觉得有用可以点个star☆159Updated 2 years ago
- Analyze the inference of Large Language Models (LLMs). Analyze aspects like computation, storage, transmission, and hardware roofline mod…☆276Updated last week
- Curated collection of papers in machine learning systems☆123Updated last month
- FlagGems is an operator library for large language models implemented in Triton Language.☆246Updated last week
- ☆28Updated last year
- 校招、秋招、春招、实习好项目,带你从零动手实现支持LLama的大模型推理框架。☆170Updated this week
- ☆140Updated 4 months ago
- A baseline repository of Auto-Parallelism in Training Neural Networks☆138Updated 2 years ago
- 看图学大模型☆148Updated last month
- This repository is established to store personal notes and annotated papers during daily research.☆78Updated last week
- 基于《cuda编程-基础与实践》(樊哲勇 著)的cuda学习之路。☆218Updated 8 months ago
- Xiao's CUDA Optimization Guide [Active Adding New Contents]☆223Updated last year
- ☆133Updated 2 months ago
- 深度学习系统笔记,包含深度学习数学基础知识、神经网络基础部件详解、深度学习炼丹策略、模型压缩算法详解、和大模型基础和推理性能优化分析。☆345Updated this week
- FlagScale is a large model toolkit based on open-sourced projects.☆129Updated last week
- paper and its code for AI System☆202Updated 3 weeks ago
- A minimalist and extensible PyTorch extension for implementing custom backend operators in PyTorch.☆25Updated 5 months ago
- Code base and slides for ECE408:Applied Parallel Programming On GPU.☆113Updated 3 years ago
- ☆251Updated last week