brucecui1998 / brucecui1998.github.io
☆11Updated this week
Related projects: ⓘ
- 分层解耦的深度学习推理引擎☆58Updated 3 weeks ago
- 大规模并行处理器编程实战 第二版答案☆26Updated 2 years ago
- A simple neural network inference framework☆23Updated last year
- b站上的课程☆69Updated last year
- CUDA 6大并行计算模式 代码与笔记☆57Updated 4 years ago
- 高性能计算☆20Updated 4 years ago
- ☆11Updated last year
- A lite and head-only CGraph-API-liked DAG project.☆12Updated this week
- ☆20Updated 3 months ago
- llama 2 Inference☆35Updated 10 months ago
- autoTVM神经网络推理代码优化搜索演示,基于tvm编译开源模型centerface,并使用autoTVM搜索最优推理代码, 最终部署编译为c++代码,演示平台是cuda,可以是其他平台,例如树莓派,安卓手机,苹果手机.Thi is a demonstration of …☆27Updated 3 years ago
- Tutorials for writing high-performance GPU operators in AI frameworks.☆118Updated last year
- ☢️ TensorRT 2023复赛——基于TensorRT-LLM的Llama模型推断加速优化☆40Updated 11 months ago
- C++数据流并行处理框架☆21Updated 3 years ago
- ☆18Updated 3 years ago
- deep learning framework from scratch☆24Updated 2 years ago
- 大模型部署实战:TensorRT-LLM, Triton Inference Server, vLLM☆25Updated 6 months ago
- 基于OpenMP和CUDA的并行计算项目 - 马赛克生成器☆7Updated 5 years ago
- Machine Learning Compiler Road Map☆40Updated last year
- ☆52Updated this week
- NumPy实现类PyTorch的动态计算图和神经网络框架(MLP, CNN, RNN, Transformer)☆71Updated 2 months ago
- ☆71Updated last year
- CUDA 编程指南学习☆26Updated 5 years ago
- NVIDIA TensorRT Hackathon 2023复赛选题:通义千问Qwen-7B用TensorRT-LLM模型搭建及优化☆39Updated 11 months ago
- 彻底弄懂BP反向传播,15行代码,C++实现也简单,MNIST分类98.29%精度☆33Updated 2 years ago
- ☆28Updated last year
- ☆34Updated 2 years ago
- DGEMM on KNL, achieve 75% MKL☆15Updated 2 years ago
- 用C++实现一个简单的Transformer模型。 Attention Is All You Need。☆37Updated 3 years ago
- ggml学习笔记,ggml是一个机器学习的推理框架☆11Updated 5 months ago