ForceInjection / AI-fundermentals
AI 基础知识 - GPU 架构、CUDA 编程以及大模型基础知识
☆67Updated 3 months ago
Alternatives and similar repositories for AI-fundermentals:
Users that are interested in AI-fundermentals are comparing it to the libraries listed below
- Compare different hardware platforms via the Roofline Model for LLM inference tasks.☆91Updated 10 months ago
- Efficient and easy multi-instance LLM serving☆284Updated last week
- FlagScale is a large model toolkit based on open-sourced projects.☆209Updated this week
- GLake: optimizing GPU memory management and IO transmission.☆424Updated 2 months ago
- Hooked CUDA-related dynamic libraries by using automated code generation tools.☆145Updated last year
- HAMi-core compiles libvgpu.so, which ensures hard limit on GPU in container☆131Updated this week
- Device-plugin for volcano vgpu which support hard resource isolation☆58Updated 3 weeks ago
- ☆57Updated 4 years ago
- A kubernetes plugin which enables dynamically add or remove GPU resources for a running Pod☆122Updated 2 years ago
- GPUd automates monitoring, diagnostics, and issue identification for GPUs☆270Updated this week
- Disaggregated serving system for Large Language Models (LLMs).☆453Updated 5 months ago
- ☆119Updated 2 months ago
- The repository has collected a batch of noteworthy MLSys bloggers (Algorithms/Systems)☆153Updated 3 weeks ago
- Kubernetes Operator for AI and Bigdata Elastic Training☆85Updated 2 weeks ago
- NVIDIA NCCL Tests for Distributed Training☆78Updated this week
- GPU-scheduler-for-deep-learning☆201Updated 4 years ago
- Automatic tuning for ML model deployment on Kubernetes☆80Updated 2 months ago
- Here are my personal paper reading notes (including cloud computing, resource management, systems, machine learning, deep learning, and o…☆65Updated this week
- Paper Reading:涉及分布式、虚拟化、网络、机器学习☆23Updated 4 years ago
- PyTorch distributed training acceleration framework☆39Updated last week
- A tutorial for CUDA&PyTorch☆126Updated last week
- Yoda is a kubernetes scheduler based on GPU metrics. Yoda是一个基于GPU参数指标的 Kubernetes 调度器☆141Updated 2 years ago
- High performance Transformer implementation in C++.☆99Updated last week
- Summary of some awesome work for optimizing LLM inference☆51Updated last month
- TePDist (TEnsor Program DISTributed) is an HLO-level automatic distributed system for DL models.☆90Updated last year
- ⚡️Write HGEMM from scratch using Tensor Cores with WMMA, MMA and CuTe API, Achieve Peak⚡️ Performance.☆47Updated this week
- Device plugins for Volcano, e.g. GPU☆113Updated 4 months ago
- Since the emergence of chatGPT in 2022, the acceleration of Large Language Model has become increasingly important. Here is a list of pap…☆213Updated last month
- A self-learning tutorail for CUDA High Performance Programing.☆337Updated last month
- llm theoretical performance analysis tools and support params, flops, memory and latency analysis.☆76Updated 3 weeks ago