harleyszhang / llm_note
LLM notes, including model inference, transformer model structure, and lightllm framework code analysis notes
☆42Updated this week
Related projects ⓘ
Alternatives and complementary repositories for llm_note
- ☆57Updated 2 weeks ago
- async inference for machine learning model☆26Updated 2 years ago
- TensorRT encapsulation, learn, rewrite, practice.☆24Updated 2 years ago
- ☆99Updated 8 months ago
- ☆23Updated last year
- ☆140Updated 6 months ago
- 该代码与B站上的视频 https://www.bilibili.com/video/BV18L41197Uz/?spm_id_from=333.788&vd_source=eefa4b6e337f16d87d87c2c357db8ca7 相关联。☆60Updated last year
- ☆138Updated 2 weeks ago
- ☆123Updated 2 weeks ago
- learning how CUDA works☆169Updated 3 months ago
- ☆32Updated last month
- ☆96Updated 3 years ago
- ☆228Updated 2 years ago
- ☢️ TensorRT 2023复赛——基于TensorRT-LLM的Llama模型推断加速优化☆44Updated last year
- Compare multiple optimization methods on triton to imporve model service performance☆46Updated 10 months ago
- TensorRT 2022复赛方案: 首个基于Transformer的图像重建模型MST++的TensorRT模型推断优化☆135Updated 2 years ago
- [EMNLP 2024 Industry Track] This is the official PyTorch implementation of "LLMC: Benchmarking Large Language Model Quantization with a V…☆322Updated this week
- FlagGems is an operator library for large language models implemented in Triton Language.☆342Updated this week
- llm-export can export llm model to onnx.☆230Updated last week
- ☆38Updated 2 years ago
- simplify >2GB large onnx model☆44Updated 8 months ago
- ☆14Updated 6 months ago
- 使用 cutlass 仓库在 ada 架构上实现 fp8 的 flash attention☆52Updated 3 months ago
- ☆26Updated last year
- NVIDIA TensorRT Hackathon 2023复赛选题:通义千问Qwen-7B用TensorRT-LLM模型搭建及优化☆40Updated last year
- b站上的课程☆70Updated last year
- OneFlow->ONNX☆42Updated last year
- llm deploy project based onnx.☆26Updated last month
- 使用 cutlass 实现 flash-attention 精简版,具有教学意义☆32Updated 3 months ago
- A CUDA tutorial to make people learn CUDA program from 0☆194Updated 4 months ago