HabanaAI / Model-References
Reference models for Intel(R) Gaudi(R) AI Accelerator
☆155Updated this week
Related projects ⓘ
Alternatives and complementary repositories for Model-References
- Intel® Extension for DeepSpeed* is an extension to DeepSpeed that brings feature support with SYCL kernels on Intel GPU(XPU) device. Note…☆57Updated 2 months ago
- oneCCL Bindings for Pytorch*☆86Updated last week
- Easy and lightning fast training of 🤗 Transformers on Habana Gaudi processor (HPU)☆152Updated this week
- A high-throughput and memory-efficient inference and serving engine for LLMs☆41Updated this week
- This is a plugin which lets EC2 developers use libfabric as network provider while running NCCL applications.☆145Updated this week
- Issues related to MLPerf™ training policies, including rules and suggested changes☆92Updated last month
- Distributed preprocessing and data loading for language datasets☆39Updated 6 months ago
- ☆109Updated 7 months ago
- NVIDIA Resiliency Extension is a python package for framework developers and users to implement fault-tolerant features. It improves the …☆34Updated 3 weeks ago
- Applied AI experiments and examples for PyTorch☆159Updated last week
- A Python library transfers PyTorch tensors between CPU and NVMe☆96Updated this week
- Issues related to MLPerf™ Inference policies, including rules and suggested changes☆57Updated this week
- Large Language Model Text Generation Inference on Habana Gaudi☆26Updated this week
- A tool for bandwidth measurements on NVIDIA GPUs.☆316Updated 3 weeks ago
- A library to analyze PyTorch traces.☆297Updated this week
- This repository contains the results and code for the MLPerf™ Training v1.0 benchmark.☆37Updated 8 months ago
- Latency and Memory Analysis of Transformer Models for Training and Inference☆352Updated 5 months ago
- Torch Distributed Experimental☆116Updated 3 months ago
- A performant, memory-efficient checkpointing library for PyTorch applications, designed with large, complex distributed workloads in mind…☆146Updated this week
- RCCL Performance Benchmark Tests☆48Updated 2 weeks ago
- PArametrized Recommendation and Ai Model benchmark is a repository for development of numerous uBenchmarks as well as end to end nets for…☆122Updated this week
- ROCm Communication Collectives Library (RCCL)☆267Updated this week
- Implementation of a Transformer, but completely in Triton☆248Updated 2 years ago
- OpenAI Triton backend for Intel® GPUs☆143Updated this week
- FTPipe and related pipeline model parallelism research.☆41Updated last year
- The Triton backend for the PyTorch TorchScript models.☆123Updated this week
- MLPerf™ logging library☆30Updated last week
- A schedule language for large model training☆141Updated 4 months ago
- NCCL Fast Socket is a transport layer plugin to improve NCCL collective communication performance on Google Cloud.☆113Updated 11 months ago
- Benchmarks to capture important workloads.☆28Updated 5 months ago