intel / e2eAIOKLinks
Intel® End-to-End AI Optimization Kit
☆31Updated last year
Alternatives and similar repositories for e2eAIOK
Users that are interested in e2eAIOK are comparing it to the libraries listed below
Sorting:
- Benchmarks to capture important workloads.☆32Updated 2 weeks ago
- Issues related to MLPerf® Inference policies, including rules and suggested changes☆63Updated this week
- oneCCL Bindings for Pytorch* (deprecated)☆104Updated last month
- ☆71Updated 10 months ago
- Deadline-based hyperparameter tuning on RayTune.☆32Updated 6 years ago
- This repository contains the results and code for the MLPerf™ Training v1.0 benchmark.☆36Updated last year
- Simple Distributed Deep Learning on TensorFlow☆134Updated 7 months ago
- A Python library transfers PyTorch tensors between CPU and NVMe☆125Updated last year
- PipeTransformer: Automated Elastic Pipelining for Distributed Training of Large-scale Models. ICML 2021☆55Updated 4 years ago
- Intel® Extension for DeepSpeed* is an extension to DeepSpeed that brings feature support with SYCL kernels on Intel GPU(XPU) device. Note…☆64Updated 7 months ago
- Fairring (FAIR + Herring) is a plug-in for PyTorch that provides a process group for distributed training that outperforms NCCL at large …☆65Updated 3 years ago
- Computation using data flow graphs for scalable machine learning☆68Updated this week
- Home for OctoML PyTorch Profiler☆113Updated 2 years ago
- A memory efficient DLRM training solution using ColossalAI☆105Updated 3 years ago
- Summary of system papers/frameworks/codes/tools on training or serving large model☆57Updated 2 years ago
- Intel Gaudi's Megatron DeepSpeed Large Language Models for training☆18Updated last year
- Reference models for Intel(R) Gaudi(R) AI Accelerator☆170Updated 3 weeks ago
- Pretrain, finetune and serve LLMs on Intel platforms with Ray☆131Updated 4 months ago
- A performant, memory-efficient checkpointing library for PyTorch applications, designed with large, complex distributed workloads in mind…☆164Updated 3 weeks ago
- Fast and memory-efficient exact attention☆111Updated last week
- MLPerf™ logging library☆38Updated last month
- Decoding Attention is specially optimized for MHA, MQA, GQA and MLA using CUDA core for the decoding stage of LLM inference.☆46Updated 7 months ago
- Runtime Tracing Library for TensorFlow☆43Updated 7 years ago
- Standalone Flash Attention v2 kernel without libtorch dependency☆113Updated last year
- Distributed preprocessing and data loading for language datasets☆40Updated last year
- ☆79Updated last year
- Inference framework for MoE layers based on TensorRT with Python binding☆41Updated 4 years ago
- ☆125Updated last year
- Distributed AI/HPC Monitoring Framework☆29Updated 9 months ago
- LLM-Inference-Bench☆58Updated 6 months ago