HabanaAI / Gaudi-tutorials
Tutorials for running models on First-gen Gaudi and Gaudi2 for Training and Inference. The source files for the tutorials on https://developer.habana.ai/
☆59Updated last week
Alternatives and similar repositories for Gaudi-tutorials:
Users that are interested in Gaudi-tutorials are comparing it to the libraries listed below
- Reference models for Intel(R) Gaudi(R) AI Accelerator☆162Updated last week
- Intel® Extension for DeepSpeed* is an extension to DeepSpeed that brings feature support with SYCL kernels on Intel GPU(XPU) device. Note…☆62Updated 2 months ago
- oneCCL Bindings for Pytorch*☆95Updated 2 weeks ago
- Fast Hadamard transform in CUDA, with a PyTorch interface☆183Updated 11 months ago
- Easy and lightning fast training of 🤗 Transformers on Habana Gaudi processor (HPU)☆186Updated this week
- [ICML 2024 Oral] Any-Precision LLM: Low-Cost Deployment of Multiple, Different-Sized LLMs☆106Updated 3 weeks ago
- Collection of kernels written in Triton language☆121Updated last month
- ☆145Updated 2 years ago
- QUICK: Quantization-aware Interleaving and Conflict-free Kernel for efficient LLM inference☆117Updated last year
- ☆52Updated last week
- Flash-LLM: Enabling Cost-Effective and Highly-Efficient Large Generative Model Inference with Unstructured Sparsity☆205Updated last year
- Intel® Tensor Processing Primitives extension for Pytorch*☆15Updated 2 weeks ago
- Large Language Model Text Generation Inference on Habana Gaudi☆33Updated last month
- Flexible simulator for mixed precision and format simulation of LLMs and vision transformers.☆49Updated last year
- Intel Gaudi's Megatron DeepSpeed Large Language Models for training☆13Updated 4 months ago
- Fast low-bit matmul kernels in Triton☆297Updated this week
- Cataloging released Triton kernels.☆220Updated 3 months ago
- A high-throughput and memory-efficient inference and serving engine for LLMs☆69Updated this week
- A Vectorized N:M Format for Unleashing the Power of Sparse Tensor Cores☆51Updated last year
- An experimentation platform for LLM inference optimisation☆29Updated 7 months ago
- Explainable AI Tooling (XAI). XAI is used to discover and explain a model's prediction in a way that is interpretable to the user. Releva…☆37Updated last month
- ☆26Updated last year
- An experimental CPU backend for Triton (https//github.com/openai/triton)☆40Updated last month
- LLM Inference with Microscaling Format☆22Updated 5 months ago
- Pytorch distributed backend extension with compression support☆16Updated last month
- End to End steps for adding custom ops in PyTorch.☆22Updated 4 years ago
- PyTorch extension for emulating FP8 data formats on standard FP32 Xeon/GPU hardware.☆109Updated 5 months ago
- Tritonbench is a collection of PyTorch custom operators with example inputs to measure their performance.☆122Updated this week
- ☆30Updated this week
- Code for the AAAI 2024 Oral paper "OWQ: Outlier-Aware Weight Quantization for Efficient Fine-Tuning and Inference of Large Language Model…☆60Updated last year