shen-shanshan / cs-self-learningLinks
This repo is used for archiving my notes, codes and materials of cs learning.
☆38Updated this week
Alternatives and similar repositories for cs-self-learning
Users that are interested in cs-self-learning are comparing it to the libraries listed below
Sorting:
- A light llama-like llm inference framework based on the triton kernel.☆134Updated this week
- UltraScale Playbook 中文版☆45Updated 3 months ago
- llm theoretical performance analysis tools and support params, flops, memory and latency analysis.☆97Updated 3 weeks ago
- 使用 CUDA C++ 实现的 llama 模型推理框架☆58Updated 8 months ago
- Optimize softmax in triton in many cases☆21Updated 10 months ago
- 使用 cutlass 实现 flash-attention 精简版,具有教学意义☆43Updated 11 months ago
- ☆31Updated 2 months ago
- learning how CUDA works☆282Updated 4 months ago
- 一个轻量化的大模型推理框架☆20Updated last month
- A tutorial for CUDA&PyTorch☆148Updated 5 months ago
- how to learn PyTorch and OneFlow☆441Updated last year
- hands on model tuning with TVM and profile it on a Mac M1, x86 CPU, and GTX-1080 GPU.☆48Updated 2 years ago
- Examples of CUDA implementations by Cutlass CuTe☆203Updated last week
- CUDA 6大并行计算模式 代码与笔记☆61Updated 4 years ago
- ☆137Updated last year
- 校招、秋招、春招、实习好项目,带你从零动手实现支持LLama2/3和Qwen2.5的大模型推理框架。☆381Updated last week
- EasyNN是一个面向教学而开发的神经网络推理框架,旨在让大家0基础也能自主完成推理框架编写!☆31Updated 10 months ago
- ☆149Updated 6 months ago
- ⚡️Write HGEMM from scratch using Tensor Cores with WMMA, MMA and CuTe API, Achieve Peak⚡️ Performance.☆86Updated 2 months ago
- [EMNLP 2024 Industry Track] This is the official PyTorch implementation of "LLMC: Benchmarking Large Language Model Quantization with a V…☆510Updated this week
- ☆21Updated 4 years ago
- ☆139Updated last year
- ☆128Updated 6 months ago
- ⚡️FFPA: Extend FlashAttention-2 with Split-D, achieve ~O(1) SRAM complexity for large headdim, 1.8x~3x↑ vs SDPA.🎉☆189Updated 2 months ago
- ☆27Updated last year
- b站上的课程☆75Updated last year
- Triton Documentation in Chinese Simplified / Triton 中文文档☆74Updated 2 months ago
- 📚200+ Tensor/CUDA Cores Kernels, ⚡️flash-attn-mma, ⚡️hgemm with WMMA, MMA and CuTe (98%~100% TFLOPS of cuBLAS/FA2 🎉🎉).☆29Updated 2 months ago
- ☆50Updated last month
- ☆78Updated this week