zhihu / TLLM_QMMLinks
TLLM_QMM strips the implementation of quantized kernels of Nvidia's TensorRT-LLM, removing NVInfer dependency and exposes ease of use Pytorch module. We modified the dequantation and weight preprocessing to align with popular quantization alogirthms such as AWQ and GPTQ, and combine them with new FP8 quantization.
☆16Updated last year
Alternatives and similar repositories for TLLM_QMM
Users that are interested in TLLM_QMM are comparing it to the libraries listed below
Sorting:
- TePDist (TEnsor Program DISTributed) is an HLO-level automatic distributed system for DL models.☆98Updated 2 years ago
- PyTorch distributed training acceleration framework☆53Updated 3 months ago
- Compare different hardware platforms via the Roofline Model for LLM inference tasks.☆119Updated last year
- ☆130Updated 11 months ago
- A prefill & decode disaggregated LLM serving framework with shared GPU memory and fine-grained compute isolation.☆118Updated 6 months ago
- ☆514Updated 2 weeks ago
- GLake: optimizing GPU memory management and IO transmission.☆490Updated 8 months ago
- AI Accelerator Benchmark focuses on evaluating AI Accelerators from a practical production perspective, including the ease of use and ver…☆276Updated 3 months ago
- Easy Parallel Library (EPL) is a general and efficient deep learning framework for distributed model training.☆270Updated 2 years ago
- ☆152Updated 10 months ago
- A high-performance framework for training wide-and-deep recommender systems on heterogeneous cluster☆159Updated last year
- KV cache store for distributed LLM inference☆368Updated 3 weeks ago
- Pipeline Parallelism Emulation and Visualization☆72Updated 5 months ago
- HierarchicalKV is a part of NVIDIA Merlin and provides hierarchical key-value storage to meet RecSys requirements. The key capability of…☆181Updated last month
- DashInfer is a native LLM inference engine aiming to deliver industry-leading performance atop various hardware architectures, including …☆267Updated 3 months ago
- DeepRec Extension is an easy-to-use, stable and efficient large-scale distributed training system based on DeepRec.☆11Updated last year
- optimized BERT transformer inference on NVIDIA GPU. https://arxiv.org/abs/2210.03052☆477Updated last year
- ☆140Updated last year
- ☆47Updated 11 months ago
- DeepXTrace is a lightweight tool for precisely diagnosing slow ranks in DeepEP-based environments.☆70Updated last week
- ☆97Updated 8 months ago
- ☆219Updated 2 years ago
- LLM training technologies developed by kwai☆66Updated last week
- ☆205Updated 7 months ago
- Transformer related optimization, including BERT, GPT☆59Updated 2 years ago
- DeepSeek-V3/R1 inference performance simulator☆169Updated 8 months ago
- RTP-LLM: Alibaba's high-performance LLM inference engine for diverse applications.☆934Updated this week
- ☆26Updated 10 months ago
- ☆57Updated 5 years ago
- Sequence-level 1F1B schedule for LLMs.☆37Updated 3 months ago