AllenJWZhu / LlamaInferLinks

LLM Inference Engine: High-performance CUDA-accelerated framework for large language model inference A cutting-edge, open-source implementation of a large language model (LLM) inference engine, optimized for consumer-grade hardware. This project showcases advanced techniques in GPU acceleration, memory management, and algorithmic optimizations
11Updated last year

Alternatives and similar repositories for LlamaInfer

Users that are interested in LlamaInfer are comparing it to the libraries listed below

Sorting: