分层解耦的深度学习推理引擎
☆79Feb 17, 2025Updated last year
Alternatives and similar repositories for RefactorGraph
Users that are interested in RefactorGraph are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- InfiniTensor is a high-performance inference engine tailored for GPUs and AI accelerators. Its design focuses on effective deployment and…☆312Mar 16, 2026Updated 3 weeks ago
- ☆126Jan 22, 2026Updated 2 months ago
- 基于 CUDA Driver API 的 cuda 运行时环境☆16Jul 30, 2025Updated 8 months ago
- ffmpeg+cuvid+tensorrt+multicamera☆12Dec 31, 2024Updated last year
- ggml学习笔记,ggml是一个机器学习的推理框架☆18Mar 24, 2024Updated 2 years ago
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- 🎉My Collections of CUDA Kernels~☆11Jun 25, 2024Updated last year
- ☆44Jan 8, 2025Updated last year
- 笔记☆53Aug 15, 2025Updated 7 months ago
- A simple high performance CUDA GEMM implementation.☆430Jan 4, 2024Updated 2 years ago
- a plugin-oriented framework for video structured. 国产程序员请加微信zhzhi78拉群交流。☆18May 28, 2024Updated last year
- A standalone GEMM kernel for fp16 activation and quantized weight, extracted from FasterTransformer☆96Feb 20, 2026Updated last month
- Efficient inference of large language models.☆150Sep 28, 2025Updated 6 months ago
- A one-page-only CGraph-API-liked DAG project.☆26Feb 11, 2025Updated last year
- 遍历设备树二进制对象☆15Nov 22, 2025Updated 4 months ago
- Proton VPN Special Offer - Get 70% off • AdSpecial partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
- Hypervisor written in Rust for the RISC-V 1.0 hypervisor extension☆16Oct 21, 2024Updated last year
- HunyuanDiT with TensorRT and libtorch☆18May 22, 2024Updated last year
- c++实现的clip推理,模型有一点点改动,但是不大,改动和导出模型的代码可以在readme里找到,模型文件都在Releases里,包括AX650的模型。新增支持ChineseCLIP☆31Jun 19, 2025Updated 9 months ago
- MegCC是一个运行时超轻量,高效,移植简单的深度学习模型编译器☆484Oct 23, 2024Updated last year
- A domain-specific language (DSL) based on Triton but providing higher-level abstractions.☆125Updated this week
- ☆120Apr 11, 2024Updated last year
- 使用 CUDA C++ 实现的 llama 模型推理框架☆65Nov 8, 2024Updated last year
- SGEMM optimization with cuda step by step☆22Mar 23, 2024Updated 2 years ago
- ☆42Nov 29, 2022Updated 3 years ago
- Virtual machines for every use case on DigitalOcean • AdGet dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
- 实验:rust 实现 llama2 推理☆17Feb 23, 2024Updated 2 years ago
- Nsight Compute In Docker☆13Dec 21, 2023Updated 2 years ago
- PointPillars TensorRT version pretrained on MMDetection3d with WaymoOpenDataset☆23Aug 11, 2022Updated 3 years ago
- ☆30Nov 16, 2024Updated last year
- My Paper Reading Lists and Notes.☆21Mar 28, 2026Updated last week
- 用C++实现一个简单的Transformer模型。 Attention Is All You Need。☆54Mar 11, 2021Updated 5 years ago
- Matrix multiplication on GPUs for matrices stored on a CPU. Similar to cublasXt, but ported to both NVIDIA and AMD GPUs.☆32Apr 2, 2025Updated last year
- Serving Inside Pytorch☆171Feb 3, 2026Updated 2 months ago
- ☆68Updated this week
- NordVPN Special Discount Offer • AdSave on top-rated NordVPN 1 or 2-year plans with secure browsing, privacy protection, and support for for all major platforms.
- ☆23Jan 3, 2024Updated 2 years ago
- ☆77Jan 25, 2025Updated last year
- Multiple GEMM operators are constructed with cutlass to support LLM inference.☆20Aug 3, 2025Updated 8 months ago
- 算子库☆17Jul 9, 2025Updated 9 months ago
- Inferflow is an efficient and highly configurable inference engine for large language models (LLMs).☆250Mar 15, 2024Updated 2 years ago
- 使用 cutlass 仓库在 ada 架构上实现 fp8 的 flash attention☆82Aug 12, 2024Updated last year
- ☆53Updated this week