chrjxj / awesome-gpu-notesLinks
☆14Updated 9 months ago
Alternatives and similar repositories for awesome-gpu-notes
Users that are interested in awesome-gpu-notes are comparing it to the libraries listed below
Sorting:
- symmetric int8 gemm☆66Updated 5 years ago
- An easy way to run, test, benchmark and tune OpenCL kernel files☆23Updated last year
- Serving Inside Pytorch☆160Updated 3 weeks ago
- how to design cpu gemm on x86 with avx256, that can beat openblas.☆70Updated 6 years ago
- autoTVM神经网络推理代码优化搜索演示,基于tvm编译开源模型centerface,并使用autoTVM搜索最优推理代码, 最终部署编译为c++代码,演示平台是cuda,可以是其他平台,例如树莓派,安卓手机,苹果手机.Thi is a demonstration of …☆27Updated 4 years ago
- Tencent NCNN with added CUDA support☆69Updated 4 years ago
- My learning notes about AI, including Machine Learning and Deep Learning.☆18Updated 5 years ago
- Use PyTorch model in C++ project☆139Updated 3 years ago
- ☆75Updated 2 years ago
- ☆23Updated last year
- Simple Dynamic Batching Inference☆145Updated 3 years ago
- notes on reading tensorflow source code☆13Updated 6 years ago
- tensorflow mnist demo api interface,include grpc,flask,webpy,tornado,django,rabbitMQ,redis,celery,tf serving,freeze_optimize_quantize☆20Updated 3 years ago
- TensorFlow Quantization Example, for TensorFlow Lite☆18Updated 5 years ago
- DeepSparkHub selects hundreds of application algorithms and models, covering various fields of AI and general-purpose computing, to suppo…☆64Updated last week
- Benchmark code for the "Online normalizer calculation for softmax" paper☆94Updated 6 years ago
- AI Infra LLM infer/ tensorrt-llm/ vllm☆20Updated 5 months ago
- ☆71Updated 2 years ago
- kmeans clustering with multi-GPU capabilities☆119Updated 2 years ago
- ☆67Updated 11 years ago
- CUDA 6大并行计算模式 代码与笔记☆61Updated 4 years ago
- Wanwu models release, code will be released soon☆24Updated 2 years ago
- Transformer related optimization, including BERT, GPT☆17Updated last year
- CUDA 编程指南学习☆29Updated 6 years ago
- implement bert in pure c++☆36Updated 5 years ago
- Simple examples of using bazel to cross compile AI applicaions for armv7hf devices.☆25Updated 3 years ago
- ☢️ TensorRT 2023复赛——基于TensorRT-LLM的Llama模型推断加速优化☆48Updated last year
- A small deep-learning framework with C++/Python/CUDA☆54Updated 7 years ago
- Transformer related optimization, including BERT, GPT☆39Updated 2 years ago
- Common libraries for PPL projects☆29Updated 2 months ago