OpenBMB / CPM.cuLinks

CPM.cu is a lightweight, high-performance CUDA implementation for LLMs, optimized for end-device inference and featuring cutting-edge techniques in sparse architecture, speculative sampling and quantization.
129Updated this week

Alternatives and similar repositories for CPM.cu

Users that are interested in CPM.cu are comparing it to the libraries listed below

Sorting: