shen-shanshan / cs-self-learningLinks
This repo is used for archiving my notes, codes and materials of cs learning.
☆43Updated last week
Alternatives and similar repositories for cs-self-learning
Users that are interested in cs-self-learning are comparing it to the libraries listed below
Sorting:
- A light llama-like llm inference framework based on the triton kernel.☆143Updated this week
- UltraScale Playbook 中文版☆47Updated 4 months ago
- how to learn PyTorch and OneFlow☆445Updated last year
- llm theoretical performance analysis tools and support params, flops, memory and latency analysis.☆99Updated 3 weeks ago
- 使用 CUDA C++ 实现的 llama 模型推理框架☆58Updated 8 months ago
- ☆31Updated 2 months ago
- Optimize softmax in triton in many cases☆21Updated 10 months ago
- ☆137Updated last year
- learning how CUDA works☆291Updated 5 months ago
- 校招、秋招、春招、实习好项目,带你从零动手实现支持LLama2/3和Qwen2.5的大模型推理框架。☆396Updated last month
- ☆149Updated 6 months ago
- A tutorial for CUDA&PyTorch☆150Updated 6 months ago
- 使用 cutlass 实现 flash-attention 精简版,具有教学意义☆44Updated 11 months ago
- Examples of CUDA implementations by Cutlass CuTe☆211Updated last month
- A self-learning tutorail for CUDA High Performance Programing.☆690Updated last month
- ☆80Updated last week
- Implement custom operators in PyTorch with cuda/c++☆65Updated 2 years ago
- 一个轻量化的大模型推理框架☆20Updated 2 months ago
- ☆128Updated 7 months ago
- ☆21Updated 4 years ago
- ☆145Updated 4 months ago
- 《CUDA编程基础与实践》一书的代码☆127Updated 3 years ago
- ☆67Updated 6 months ago
- CUDA 6大并行计算模式 代码与笔记☆60Updated 5 years ago
- ☆139Updated last year
- FlagGems is an operator library for large language models implemented in the Triton Language.☆635Updated this week
- hands on model tuning with TVM and profile it on a Mac M1, x86 CPU, and GTX-1080 GPU.☆49Updated 2 years ago
- [EMNLP 2024 Industry Track] This is the official PyTorch implementation of "LLMC: Benchmarking Large Language Model Quantization with a V…☆522Updated this week
- ☆24Updated 4 months ago
- A prefill & decode disaggregated LLM serving framework with shared GPU memory and fine-grained compute isolation.☆103Updated 2 months ago