分层解耦的深度学习推理引擎
☆78Feb 17, 2025Updated last year
Alternatives and similar repositories for RefactorGraph
Users that are interested in RefactorGraph are comparing it to the libraries listed below
Sorting:
- ☆291Feb 4, 2026Updated last month
- ☆125Jan 22, 2026Updated last month
- 基于 CUDA Driver API 的 cuda 运行时环境☆14Jul 30, 2025Updated 7 months ago
- ffmpeg+cuvid+tensorrt+multicamera☆11Dec 31, 2024Updated last year
- ggml学习笔记,ggml是一个机器学习的推理框架☆17Mar 24, 2024Updated last year
- ☆43Jan 8, 2025Updated last year
- 🎉My Collections of CUDA Kernels~☆10Jun 25, 2024Updated last year
- 笔记☆52Aug 15, 2025Updated 7 months ago
- A simple high performance CUDA GEMM implementation.☆426Jan 4, 2024Updated 2 years ago
- a plugin-oriented framework for video structured. 国产程序员请加微信zhzhi78拉群交流。☆18May 28, 2024Updated last year
- A standalone GEMM kernel for fp16 activation and quantized weight, extracted from FasterTransformer☆95Feb 20, 2026Updated 3 weeks ago
- A domain-specific language (DSL) based on Triton but providing higher-level abstractions.☆41Mar 5, 2026Updated 2 weeks ago
- Efficient inference of large language models.☆149Sep 28, 2025Updated 5 months ago
- A one-page-only CGraph-API-liked DAG project.☆25Feb 11, 2025Updated last year
- 遍历设备树二进制对象☆14Nov 22, 2025Updated 3 months ago
- Hypervisor written in Rust for the RISC-V 1.0 hypervisor extension☆16Oct 21, 2024Updated last year
- HunyuanDiT with TensorRT and libtorch☆17May 22, 2024Updated last year
- c++实现的clip推理,模型有一点点改动,但是不大,改动和导出模型的代码可以在readme里找到,模型文件都在Releases里,包括AX650的模型。新增支持ChineseCLIP☆30Jun 19, 2025Updated 9 months ago
- MegCC是一个运行时超轻量,高效,移植简单的深度学习模型编译器☆484Oct 23, 2024Updated last year
- ☆120Apr 11, 2024Updated last year
- 使用 CUDA C++ 实现的 llama 模型推理框架☆63Nov 8, 2024Updated last year
- SGEMM optimization with cuda step by step☆21Mar 23, 2024Updated last year
- ☆40Updated this week
- ☆42Nov 29, 2022Updated 3 years ago
- 实验:rust 实现 llama2 推理☆17Feb 23, 2024Updated 2 years ago
- Nsight Compute In Docker☆13Dec 21, 2023Updated 2 years ago
- ☆29Nov 16, 2024Updated last year
- PointPillars TensorRT version pretrained on MMDetection3d with WaymoOpenDataset☆21Aug 11, 2022Updated 3 years ago
- My Paper Reading Lists and Notes.☆21Mar 13, 2026Updated last week
- 用C++实现一个简单的Transformer模型。 Attention Is All You Need。☆54Mar 11, 2021Updated 5 years ago
- Matrix multiplication on GPUs for matrices stored on a CPU. Similar to cublasXt, but ported to both NVIDIA and AMD GPUs.☆31Apr 2, 2025Updated 11 months ago
- Serving Inside Pytorch☆170Feb 3, 2026Updated last month
- ☆23Jan 3, 2024Updated 2 years ago
- 使用 cutlass 仓库在 ada 架构上实现 fp8 的 flash attention☆80Aug 12, 2024Updated last year
- ☆74Jan 25, 2025Updated last year
- Multiple GEMM operators are constructed with cutlass to support LLM inference.☆19Aug 3, 2025Updated 7 months ago
- 算子库☆17Jul 9, 2025Updated 8 months ago
- Inferflow is an efficient and highly configurable inference engine for large language models (LLMs).☆251Mar 15, 2024Updated 2 years ago
- ☆52Updated this week