InfiniTensor / InfiniCoreLinks
☆26Updated last week
Alternatives and similar repositories for InfiniCore
Users that are interested in InfiniCore are comparing it to the libraries listed below
Sorting:
- A light llama-like llm inference framework based on the triton kernel.☆160Updated last month
- ☆261Updated 2 weeks ago
- 使用 CUDA C++ 实现的 llama 模型推理框架☆62Updated 11 months ago
- EasyNN是一个面向教学而开发的神经网络推理框架,旨在让大家0基础也能自主完成推理框架编写!☆33Updated last year
- Xiao's CUDA Optimization Guide [NO LONGER ADDING NEW CONTENT]☆316Updated 2 years ago
- 分层解耦的深度学习推理引擎☆76Updated 8 months ago
- ☆37Updated last year
- ☆141Updated last year
- 使用 cutlass 实现 flash-attention 精简版,具有教学意义☆50Updated last year
- 🎉My Collections of CUDA Kernels~☆11Updated last year
- b站上的课程☆76Updated 2 years ago
- ☆37Updated 5 months ago
- learning how CUDA works☆333Updated 7 months ago
- Personal Notes for Learning HPC & Parallel Computation [Active Adding New Content]☆74Updated 3 years ago
- 校招、秋招、春招、实习好项目,带你从零动手实现支持LLama2/3和Qwen2.5的大模型推理框架。☆439Updated 3 months ago
- Some common CUDA kernel implementations (Not the fastest).☆28Updated 2 months ago
- Examples of CUDA implementations by Cutlass CuTe☆244Updated 4 months ago
- ☆33Updated last year
- CUDA 算子手撕与面试指南☆659Updated 2 months ago
- A tutorial for CUDA&PyTorch☆159Updated 9 months ago
- ☆109Updated 6 months ago
- hands on model tuning with TVM and profile it on a Mac M1, x86 CPU, and GTX-1080 GPU.☆50Updated 2 years ago
- Codes & examples for "CUDA - From Correctness to Performance"☆115Updated last year
- Tutorials for writing high-performance GPU operators in AI frameworks.☆132Updated 2 years ago
- ☆18Updated last year
- how to learn PyTorch and OneFlow☆458Updated last year
- ☆70Updated 9 months ago
- ☆115Updated last year
- easy cuda code☆87Updated 10 months ago
- A simple high performance CUDA GEMM implementation.☆411Updated last year