OpenBMB / CPM.cuLinks

CPM.cu is a lightweight, high-performance CUDA implementation for LLMs, optimized for end-device inference and featuring cutting-edge techniques in sparse architecture, speculative sampling and quantization.
225Updated 3 weeks ago

Alternatives and similar repositories for CPM.cu

Users that are interested in CPM.cu are comparing it to the libraries listed below

Sorting: