harleyszhang / llm_counts
llm theoretical performance analysis tools and support params, flops, memory and latency analysis.
☆76Updated 3 weeks ago
Alternatives and similar repositories for llm_counts:
Users that are interested in llm_counts are comparing it to the libraries listed below
- Examples of CUDA implementations by Cutlass CuTe☆132Updated 2 months ago
- learning how CUDA works☆190Updated 5 months ago
- Summary of some awesome work for optimizing LLM inference☆51Updated last month
- ☆142Updated 3 weeks ago
- ☆106Updated 10 months ago
- A light llama-like llm inference framework based on the triton kernel.