TRT2022 / trtllm-llama
☢️ TensorRT 2023复赛——基于TensorRT-LLM的Llama模型推断加速优化
☆44Updated last year
Related projects ⓘ
Alternatives and complementary repositories for trtllm-llama
- NVIDIA TensorRT Hackathon 2023复赛选题:通义千问Qwen-7B用TensorRT-LLM模型搭建及优化☆40Updated last year
- 天池 NVIDIA TensorRT Hackathon 2023 —— 生成式AI模型优化赛 初赛第三名方案☆47Updated last year
- ☆23Updated last year
- Transformer related optimization, including BERT, GPT☆17Updated last year
- simplify >2GB large onnx model☆42Updated 8 months ago
- Decoding Attention is specially optimized for multi head attention (MHA) using CUDA core for the decoding stage of LLM inference.☆24Updated this week
- ☆56Updated this week
- ☆90Updated last year
- export llama to onnx☆95Updated 5 months ago
- ☆140Updated 6 months ago
- ☆117Updated last year
- Performance of the C++ interface of flash attention and flash attention v2 in large language model (LLM) inference scenarios.☆26Updated 2 months ago
- Large Language Model Onnx Inference Framework☆24Updated 3 weeks ago
- run ChatGLM2-6B in BM1684X☆48Updated 8 months ago
- llm deploy project based onnx.☆26Updated last month
- 大模型部署实战:TensorRT-LLM, Triton Inference Server, vLLM☆26Updated 8 months ago
- ☆123Updated this week
- Trans different platform's network to International Representation(IR)☆44Updated 6 years ago
- Serving Inside Pytorch☆142Updated this week
- ggml学习笔记,ggml是一个机器学习的推理框架☆11Updated 7 months ago
- OneFlow->ONNX☆42Updated last year
- TensorRT 2022复赛方案: 首个基于Transformer的图像重建模型MST++的TensorRT模型推断优化☆135Updated 2 years ago
- ☆32Updated 3 weeks ago
- ☆97Updated 7 months ago
- 使用 cutlass 实现 flash-attention 精简版,具有教学意义☆31Updated 2 months ago
- ☆70Updated last year
- Transformer related optimization, including BERT, GPT☆60Updated last year
- Collection of blogs on AI development☆14Updated 3 months ago
- LLM notes, including model inference, transformer model structure, and lightllm framework code analysis notes☆27Updated this week
- Compare multiple optimization methods on triton to imporve model service performance☆46Updated 10 months ago