shen-shanshan / cs-self-learningLinks

This repo is used for archiving my notes, codes and materials of cs learning.

☆53

Alternatives and similar repositories for cs-self-learning

Users that are interested in cs-self-learning are comparing it to the libraries listed below

Sorting:

harleyszhang / lite_llama
A light llama-like llm inference framework based on the triton kernel.
☆153Updated 2 weeks ago
pprp / ultrascale-playbook-zh
UltraScale Playbook 中文版
☆77Updated 6 months ago
harleyszhang / llm_counts
llm theoretical performance analysis tools and support params, flops, memory and latency analysis.
☆108Updated 2 months ago
ifromeast / cuda_learning
learning how CUDA works
☆323Updated 7 months ago
cr7258 / ai-infra-learning
This repository organizes materials, recordings, and schedules related to AI-infra learning meetings.
☆170Updated 2 weeks ago
iclementine / optimize_softmax
Optimize softmax in triton in many cases
☆21Updated last year
BBuf / how-to-learn-deep-learning-framework
how to learn PyTorch and OneFlow
☆456Updated last year
PaddleJitLab / CUDATutorial
A self-learning tutorail for CUDA High Performance Programing.
☆748Updated 3 months ago
RussWong / CUDATutorial
A CUDA tutorial to make people learn CUDA program from 0
☆251Updated last year
zjhellofss / KuiperLLama
校招、秋招、春招、实习好项目，带你从零动手实现支持LLama2/3和Qwen2.5的大模型推理框架。
☆426Updated 3 months ago
caiwanxianhust / FasterLLaMA
使用 CUDA C++ 实现的 llama 模型推理框架
☆62Updated 11 months ago
FlagOpen / FlagGems
FlagGems is an operator library for large language models implemented in the Triton Language.
☆690Updated last week
DD-DuDa / Cute-Learning
Examples of CUDA implementations by Cutlass CuTe
☆240Updated 3 months ago
RussWong / LLM-engineering
☆25Updated last month
DefTruth / CUDA-Learn-Notes
📚200+ Tensor/CUDA Cores Kernels, ⚡️flash-attn-mma, ⚡️hgemm with WMMA, MMA and CuTe (98%~100% TFLOPS of cuBLAS/FA2 🎉🎉).
☆45Updated 5 months ago
OpenPPL / ppl.llm.kernel.cuda
☆150Updated 8 months ago
weishengying / tiny-flash-attention
使用 cutlass 实现 flash-attention 精简版，具有教学意义
☆49Updated last year
ModelTC / LightCompress
A powerful toolkit for compressing large models including LLM, VLM, and video generation models.
☆579Updated last month
OpenPPL / ppl.nn.llm
☆140Updated last year
antgroup / glake
GLake: optimizing GPU memory management and IO transmission.
☆480Updated 6 months ago
Tencent / KsanaLLM
☆503Updated 3 weeks ago
harleyszhang / llm_note
LLM notes, including model inference, transformer model structure, and llm framework code analysis notes.
☆827Updated 3 weeks ago
66RING / tiny-flash-attention
flash attention tutorial written in python, triton, cuda, cutlass
☆425Updated 4 months ago
KarhouTam / cuda-kernels
Some common CUDA kernel implementations (Not the fastest).
☆25Updated last month
CalvinXKY / BasicCUDA
A tutorial for CUDA&PyTorch
☆155Updated 8 months ago
OpenPPL / ppl.llm.serving
☆129Updated 9 months ago
OpenPPL / ppl.pmx
☆59Updated 10 months ago
zjhellofss / triton_course
☆36Updated 4 months ago
AyakaGEMM / Hands-on-GEMM
☆139Updated last year
DeepLink-org / DIOPI
☆75Updated 10 months ago