hpc 教程,包含集合通信(mpi、nccl)、cuda 编程、向量化 SIMD、RDMA 通信等
☆472Apr 27, 2026Updated 3 weeks ago
Alternatives and similar repositories for ai-infra-hpc
Users that are interested in ai-infra-hpc are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Implement a Pytorch-like DL library in C++ from scratch, step by step☆291Apr 15, 2026Updated last month
- ☆16Nov 26, 2020Updated 5 years ago
- ☆13Jun 23, 2022Updated 3 years ago
- ☆24Jan 21, 2026Updated 4 months ago
- AI Infra学习笔记,完整高清大图;学习路线推荐☆188Feb 27, 2026Updated 2 months ago
- Proton VPN Special Offer - Get 70% off • AdSpecial partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
- 我的Stable Diffusion WebUI的學習筆記(使用Google Colaboratory)☆10Oct 5, 2023Updated 2 years ago
- High performance RDMA-based distributed feature collection component for training GNN model on EXTREMELY large graph☆55Jul 3, 2022Updated 3 years ago
- From Minimal GEMM to Everything☆207Updated this week
- Mini CCL - A lightweight collective communication library☆32Jan 2, 2026Updated 4 months ago
- CUDA 算子手撕与面试指南☆981Aug 23, 2025Updated 9 months ago
- autoTVM神经网络推理代码优化搜索演示,基于tvm编译开源模型centerface,并使用autoTVM搜索最优推理代码, 最终部署编译为c++代码,演示平台是cuda,可以是其他平台,例如树莓派,安卓手机,苹果手机.Thi is a demonstration of …☆30May 6, 2021Updated 5 years ago
- how to learn PyTorch and OneFlow☆496Updated this week
- 一个用于管理多个 Claude API 配置的命令行工具。可以轻松在不同环境或账户的 API 密钥和基础 URL 之间切换。☆26Aug 7, 2025Updated 9 months ago
- ☆23Aug 20, 2025Updated 9 months ago
- End-to-end encrypted email - Proton Mail • AdSpecial offer: 40% Off Yearly / 80% Off First Month. All Proton services are open source and independently audited for security.
- An experimental implementation of compiler-driven automatic sharding of models across a given device mesh.☆79May 18, 2026Updated last week
- 这是我生产实习的项目——GAN实现图像风格迁移☆13Jul 21, 2021Updated 4 years ago
- An Automated Performance Optimization Framework for P4-Programmable SmartNICs☆28Nov 18, 2023Updated 2 years ago
- 基于 Spring Boot 的 BOSS 直聘职位信息爬虫系统,提供自动化的职位信息采集和数据处理功能。系统采用现代化的技术栈,包括 Spring Boot 框架、SQLite 数据库和 RESTful API 设计,实现了智能的反爬虫策略和高效的数据解析能力。该系统可以…☆22Mar 16, 2025Updated last year
- 《自己动手写AI编译器》☆38Oct 19, 2024Updated last year
- 聚焦海量面经检索、简历分析与模拟面试的 AI 求职准备平台☆137Mar 30, 2026Updated last month
- ☆154Mar 18, 2024Updated 2 years ago
- [EMNLP 2025] The official implementation of "Zero-shot Multimodal Document Retrieval via Cross-Modal Question Generation"☆15Aug 26, 2025Updated 8 months ago
- Extending BookSim2.0 and HotSpot6.0 for Power, Performance and Thermal evaluation of 3D NoC Architectures☆14Aug 9, 2019Updated 6 years ago
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- AI Infra 全栈从0入门学习资料:https://caomaolufei.github.io/AIInfraGuide/☆615Updated this week
- how to optimize some algorithm in cuda.☆2,998Updated this week
- Flutter embedder for Tizen☆13May 13, 2026Updated last week
- KNN算法基于Hadoop平台的MapReduce实现☆12Jun 28, 2020Updated 5 years ago
- See vLLM official support: https://github.com/vllm-project/vllm-ascend☆11Feb 5, 2025Updated last year
- A Flexible Cache Architectural Simulator☆17Sep 16, 2025Updated 8 months ago
- IPDK Networking Recipe (P4 Control Plane)☆41May 11, 2026Updated 2 weeks ago
- ☆12Jul 7, 2021Updated 4 years ago
- Official codebase for our paper "Joslim: Joint Widths and Weights Optimization for Slimmable Neural Networks"☆12Jun 30, 2021Updated 4 years ago
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- compiler learning resources collect.☆2,736Updated this week
- 电商视觉文案设计SOP Skill - 6步工作流+合规规则库+自审评分机制☆266May 4, 2026Updated 3 weeks ago
- Codebase for the Paper "Deep Semi-supervised Learning (SSL) for Time Series Classification (TSC)" to appear at the ICMLA '21☆11Mar 15, 2022Updated 4 years ago
- Graph partitioning for distributed GNN training☆13Mar 26, 2023Updated 3 years ago
- A standalone GEMM kernel for fp16 activation and quantized weight, extracted from FasterTransformer☆96Feb 20, 2026Updated 3 months ago
- This is a series of GPU optimization topics. Here we will introduce how to optimize the CUDA kernel in detail. I will introduce several…☆1,303Jul 29, 2023Updated 2 years ago
- 分享AI Infra知识&代码练习:PyTorch/vLLM/SGLang框架入门⚡️、性能加速🚀、大模型基础🧠、AI软硬件🔧等☆2,337May 8, 2026Updated 2 weeks ago