zeux / calmLinks
CUDA/Metal accelerated language model inference
☆615Updated 4 months ago
Alternatives and similar repositories for calm
Users that are interested in calm are comparing it to the libraries listed below
Sorting:
- Yet Another Language Model: LLM inference in C++/CUDA, no libraries except for I/O☆503Updated last month
- kernels, of the mega variety☆579Updated 2 weeks ago
- Perplexity GPU Kernels☆488Updated 3 weeks ago
- Flash Attention in ~100 lines of CUDA (forward pass only)☆941Updated 9 months ago
- A throughput-oriented high-performance serving framework for LLMs