hpc 教程,包含集合通信(mpi、nccl)、cuda 编程、向量化 SIMD、RDMA 通信等
☆440Apr 27, 2026Updated last week
Alternatives and similar repositories for ai-infra-hpc
Users that are interested in ai-infra-hpc are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Implement a Pytorch-like DL library in C++ from scratch, step by step☆270Apr 15, 2026Updated 3 weeks ago
- ☆13Jun 23, 2022Updated 3 years ago
- ☆13Jul 28, 2024Updated last year
- High performance RDMA-based distributed feature collection component for training GNN model on EXTREMELY large graph☆55Jul 3, 2022Updated 3 years ago
- From Minimal GEMM to Everything☆201Feb 10, 2026Updated 2 months ago
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- Mini CCL - A lightweight collective communication library☆32Jan 2, 2026Updated 4 months ago
- The official implement of CTRNet++.☆15Dec 30, 2024Updated last year
- Slowist's notebook☆17Mar 18, 2026Updated last month
- autoTVM神经网络推理代码优化搜索演示,基于tvm编译开源模型centerface,并使用autoTVM搜索最优推理代码, 最终部署编译为c++代码,演示平台是cuda,可以是其他平台,例如树莓派,安 卓手机,苹果手机.Thi is a demonstration of …☆30May 6, 2021Updated 5 years ago
- how to learn PyTorch and OneFlow☆497Mar 22, 2024Updated 2 years ago
- 一个用于管理多个 Claude API 配置的命令行工具。可以轻松在不同环境或账户的 API 密钥和基础 URL 之间切换。☆25Aug 7, 2025Updated 8 months ago
- 快来生成你的浏 览记录年度总结!☆18Dec 12, 2024Updated last year
- cc98爬虫☆15Sep 1, 2013Updated 12 years ago
- 《C++模板元编程实战:一个深度学习框架的初步实现》记录。☆18Nov 5, 2022Updated 3 years ago
- AI Agents on DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- CUDA 算子手撕与面试指南☆956Aug 23, 2025Updated 8 months ago
- ☆23Aug 20, 2025Updated 8 months ago
- [BMVC2024] Erasing Concepts from Text-to-Image Diffusion Models with Few-shot Unlearning☆14Apr 29, 2026Updated last week
- [SynthText Chinese] Improved code for generating synthetic text images as described in "Synthetic Data for Text Localisation in Natural I…☆13Dec 8, 2022Updated 3 years ago
- An Automated Performance Optimization Framework for P4-Programmable SmartNICs☆28Nov 18, 2023Updated 2 years ago
- 聚焦海量面经检索、简历分析与模拟面试的 AI 求职准备平台☆131Mar 30, 2026Updated last month
- ☆45May 4, 2025Updated last year
- Implementation of Baseline for Scene Text-to-Scene Text Translation☆19Mar 30, 2025Updated last year
- ☆152Mar 18, 2024Updated 2 years ago
- Deploy open-source AI quickly and easily - Special Bonus Offer • AdRunpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
- GPU-accelerated LLM Training Simulator☆18Jun 26, 2025Updated 10 months ago
- 大规模并行处理器编程实战 第二版答案☆36Jun 4, 2022Updated 3 years ago
- how to optimize some algorithm in cuda.☆2,960Updated this week
- KNN算法基于Hadoop平台的MapReduce实现☆12Jun 28, 2020Updated 5 years ago
- 浙江大学校内论坛 CC98 的Windows 11桌面客户端,基于Win UI3构建。☆33Apr 29, 2026Updated last week
- See vLLM official support: https://github.com/vllm-project/vllm-ascend☆11Feb 5, 2025Updated last year
- A Flexible Cache Architectural Simulator☆17Sep 16, 2025Updated 7 months ago
- IPDK Networking Recipe (P4 Control Plane)☆41Apr 27, 2026Updated last week
- Hand-Rolled GPU communications library☆92Nov 25, 2025Updated 5 months ago
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- compiler learning resources collect.☆2,722Mar 19, 2025Updated last year
- 分享AI Infra知识&代码练习:PyTorch/vLLM/SGLang框架入门⚡️、性能加速🚀、大模型基础🧠、AI软硬件🔧等☆2,046Updated this week
- 实验:rust 实现 llama2 推理☆17Feb 23, 2024Updated 2 years ago
- Graph partitioning for distributed GNN training☆13Mar 26, 2023Updated 3 years ago
- A standalone GEMM kernel for fp16 activation and quantized weight, extracted from FasterTransformer☆96Feb 20, 2026Updated 2 months ago
- This is a series of GPU optimization topics. Here we will introduce how to optimize the CUDA kernel in detail. I will introduce several…☆1,290Jul 29, 2023Updated 2 years ago
- 📚LeetCUDA: Modern CUDA Learn Notes with PyTorch for Beginners🐑, 200+ CUDA Kernels, Tensor Cores, HGEMM, FA-2 MMA.🎉☆10,825Apr 20, 2026Updated 2 weeks ago