lenLRX / llm_simple
☆11Updated last month
Alternatives and similar repositories for llm_simple:
Users that are interested in llm_simple are comparing it to the libraries listed below
- ☢️ TensorRT 2023复赛——基于TensorRT-LLM的Llama模型推断加速优化☆46Updated last year
- GPTQ inference TVM kernel☆38Updated 11 months ago
- 使用 cutlass 仓库在 ada 架构上实现 fp8 的 flash attention☆60Updated 7 months ago
- Quantized Attention on GPU☆45Updated 4 months ago
- Summary of system papers/frameworks/codes/tools on training or serving large model☆56Updated last year
- Odysseus: Playground of LLM Sequence Parallelism☆68Updated 9 months ago
- ☆86Updated last year
- ☆50Updated 2 months ago
- Implement Flash Attention using Cute.☆74Updated 3 months ago
- Standalone Flash Attention v2 kernel without libtorch dependency☆108Updated 6 months ago
- ☆40Updated last week
- OneFlow->ONNX☆42Updated last year
- ☆18Updated last year
- ☆139Updated 11 months ago
- A standalone GEMM kernel for fp16 activation and quantized weight, extracted from FasterTransformer☆90Updated last month
- Open deep learning compiler stack for cpu, gpu and specialized accelerators☆18Updated this week
- Decoding Attention is specially optimized for MHA, MQA, GQA and MLA using CUDA core for the decoding stage of LLM inference.☆35Updated 3 weeks ago
- ☆58Updated 4 months ago
- Performance of the C++ interface of flash attention and flash attention v2 in large language model (LLM) inference scenarios.☆35Updated last month
- QQQ is an innovative and hardware-optimized W4A8 quantization solution for LLMs.☆109Updated 2 weeks ago
- This is a demo how to write a high performance convolution run on apple silicon☆54Updated 3 years ago
- 大模型部署实战:TensorRT-LLM, Triton Inference Server, vLLM☆26Updated last year
- 使用 CUDA C++ 实现的 llama 模型推理框架☆48Updated 4 months ago
- ☆127Updated 3 months ago
- ☆75Updated last week
- Multiple GEMM operators are constructed with cutlass to support LLM inference.☆17Updated 6 months ago
- ☆65Updated 3 months ago
- ☆90Updated 6 months ago
- ☆19Updated 6 months ago
- An easy-to-use package for implementing SmoothQuant for LLMs☆95Updated 10 months ago