AllenJWZhu / LlamaInferView on GitHub
LLM Inference Engine: High-performance CUDA-accelerated framework for large language model inference A cutting-edge, open-source implementation of a large language model (LLM) inference engine, optimized for consumer-grade hardware. This project showcases advanced techniques in GPU acceleration, memory management, and algorithmic optimizations
11Sep 29, 2024Updated last year

Alternatives and similar repositories for LlamaInfer

Users that are interested in LlamaInfer are comparing it to the libraries listed below

Sorting:

Are these results useful?