HabanaAI / Gaudi-tutorials
Tutorials for running models on First-gen Gaudi and Gaudi2 for Training and Inference. The source files for the tutorials on https://developer.habana.ai/
☆57Updated this week
Alternatives and similar repositories for Gaudi-tutorials:
Users that are interested in Gaudi-tutorials are comparing it to the libraries listed below
- Reference models for Intel(R) Gaudi(R) AI Accelerator☆159Updated this week
- Fast Hadamard transform in CUDA, with a PyTorch interface☆150Updated 9 months ago
- ☆139Updated last year
- Flash-LLM: Enabling Cost-Effective and Highly-Efficient Large Generative Model Inference with Unstructured Sparsity☆201Updated last year
- Intel Gaudi's Megatron DeepSpeed Large Language Models for training☆13Updated 2 months ago
- QUICK: Quantization-aware Interleaving and Conflict-free Kernel for efficient LLM inference☆116Updated 11 months ago
- Cataloging released Triton kernels.☆176Updated last month
- Test suite for probing the numerical behavior of NVIDIA tensor cores☆37Updated 7 months ago
- oneCCL Bindings for Pytorch*☆89Updated 2 months ago
- Provides the examples to write and build Habana custom kernels using the HabanaTools☆20Updated 3 months ago
- An efficient GPU support for LLM inference with x-bit quantization (e.g. FP6,FP5).☆237Updated 4 months ago
- An experimentation platform for LLM inference optimisation☆29Updated 5 months ago
- PyTorch extension for emulating FP8 data formats on standard FP32 Xeon/GPU hardware.☆106Updated 3 months ago
- Collection of kernels written in Triton language☆107Updated last week
- Memory Optimizations for Deep Learning (ICML 2023)☆62Updated 11 months ago
- Large Language Model Text Generation Inference on Habana Gaudi☆33Updated this week
- Easy and lightning fast training of 🤗 Transformers on Habana Gaudi processor (HPU)☆173Updated this week
- ☆20Updated this week
- A high-throughput and memory-efficient inference and serving engine for LLMs☆56Updated this week
- ☆109Updated 2 months ago
- Intel® Extension for DeepSpeed* is an extension to DeepSpeed that brings feature support with SYCL kernels on Intel GPU(XPU) device. Note…☆61Updated 2 months ago
- [ICML 2024] KIVI: A Tuning-Free Asymmetric 2bit Quantization for KV Cache☆276Updated last month
- ☆34Updated this week
- SliM-LLM: Salience-Driven Mixed-Precision Quantization for Large Language Models☆27Updated 6 months ago
- Fast low-bit matmul kernels in Triton☆250Updated last week
- Machine Learning Agility (MLAgility) benchmark and benchmarking tools☆38Updated 2 months ago
- ShiftAddLLM: Accelerating Pretrained LLMs via Post-Training Multiplication-Less Reparameterization☆102Updated 4 months ago
- ☆71Updated 3 months ago
- Ongoing research training transformer models at scale☆16Updated this week
- [ICML 2024 Oral] Any-Precision LLM: Low-Cost Deployment of Multiple, Different-Sized LLMs☆96Updated 2 months ago