winston779 / suyunti
速云梯官网地址
☆24Updated 3 weeks ago
Related projects ⓘ
Alternatives and complementary repositories for suyunti
- 基于MNN-llm的安卓手机部署大语言模型:Qwen1.5-0.5B-Chat☆48Updated 7 months ago
- CPU Memory Compiler and Parallel programing☆24Updated this week
- LLM101n: Let's build a Storyteller 中文版☆118Updated 3 months ago
- Tencent Distribution of TVM☆15Updated last year
- ☢️ TensorRT 2023复赛——基于TensorRT-LLM的Llama模型推断加速优化☆44Updated last year
- 🎉CUDA 笔记 / 高频面试题汇总 / C++笔记,个人笔记,更新随缘: sgemm、sgemv、warp reduce、block reduce、dot product、elementwise、softmax、layernorm、rmsnorm、hist etc.☆16Updated 9 months ago
- ☆49Updated 11 months ago
- ☆10Updated 3 years ago
- ☆16Updated 7 months ago
- Free resource for the book AI Compiler Development Guide☆40Updated last year
- CUDA 6大并行计算模式 代码与笔记☆58Updated 4 years ago
- MXMACA入门materials☆14Updated 5 months ago
- ☆51Updated last year
- ☆144Updated this week
- run ChatGLM2-6B in BM1684X☆48Updated 8 months ago
- NVIDIA TensorRT Hackathon 2023复赛选题:通义千问Qwen-7B用TensorRT-LLM模型搭建及优化☆40Updated last year
- miemienet is a C++ AI deep learning inference framework.Supports PPYOLOE、PICODET.☆11Updated 2 years ago
- Quick and Self-Contained TensorRT Custom Plugin Implementation and Integration☆38Updated 5 months ago
- This project is about convolution operator optimization on GPU, include GEMM based (Implicit GEMM) convolution.☆20Updated 2 months ago
- A self-learning tutorail for CUDA High Performance Programing.☆257Updated last week
- 通义千问VLLM推理部署DEMO☆443Updated 7 months ago
- ☆22Updated 7 months ago
- Examples of CUDA implementations by Cutlass CuTe☆98Updated last week
- Swin Transformer C++ Implementation☆54Updated 3 years ago
- Implementation of FlashAttention in PyTorch☆123Updated last year
- ☆217Updated last year
- AIFoundation 主要是指AI系统遇到大模型,从底层到上层如何系统级地支持大模型训练和推理,全栈的核心技术。☆289Updated last month
- The repository targets the OpenCL gemm function performance optimization. It compares several libraries clBLAS, clBLAST, MIOpenGemm, Inte…☆16Updated 5 years ago
- Performance of the C++ interface of flash attention and flash attention v2 in large language model (LLM) inference scenarios.☆14Updated last year
- Codes & examples for "CUDA - From Correctness to Performance"☆70Updated 3 weeks ago
- TensorRT简明教程☆25Updated 3 years ago