HabanaAI / Gaudi-tutorials
Tutorials for running models on First-gen Gaudi and Gaudi2 for Training and Inference. The source files for the tutorials on https://developer.habana.ai/
☆59Updated this week
Alternatives and similar repositories for Gaudi-tutorials:
Users that are interested in Gaudi-tutorials are comparing it to the libraries listed below
- Reference models for Intel(R) Gaudi(R) AI Accelerator☆162Updated last month
- oneCCL Bindings for Pytorch*☆91Updated this week
- Machine Learning Agility (MLAgility) benchmark and benchmarking tools☆38Updated 3 weeks ago
- Intel® Extension for DeepSpeed* is an extension to DeepSpeed that brings feature support with SYCL kernels on Intel GPU(XPU) device. Note…☆62Updated 3 weeks ago
- Fast Hadamard transform in CUDA, with a PyTorch interface☆165Updated 10 months ago
- Easy and lightning fast training of 🤗 Transformers on Habana Gaudi processor (HPU)☆181Updated this week
- Flash-LLM: Enabling Cost-Effective and Highly-Efficient Large Generative Model Inference with Unstructured Sparsity☆203Updated last year
- Collection of kernels written in Triton language☆117Updated last month
- QUICK: Quantization-aware Interleaving and Conflict-free Kernel for efficient LLM inference☆116Updated last year
- Cataloging released Triton kernels.☆213Updated 2 months ago
- ☆49Updated 2 weeks ago
- Get down and dirty with FlashAttention2.0 in pytorch, plug in and play no complex CUDA kernels☆102Updated last year
- Large Language Model Text Generation Inference on Habana Gaudi☆32Updated 2 weeks ago
- Fast low-bit matmul kernels in Triton☆275Updated this week
- ☆141Updated 2 years ago
- ShiftAddLLM: Accelerating Pretrained LLMs via Post-Training Multiplication-Less Reparameterization☆105Updated 5 months ago
- ☆37Updated this week
- Intel Gaudi's Megatron DeepSpeed Large Language Models for training☆13Updated 3 months ago
- Qualcomm Cloud AI SDK (Platform and Apps) enable high performance deep learning inference on Qualcomm Cloud AI platforms delivering high …☆56Updated 5 months ago
- [ICML 2024 Oral] Any-Precision LLM: Low-Cost Deployment of Multiple, Different-Sized LLMs☆100Updated 3 months ago
- ☆76Updated 4 months ago
- Development repository for the Triton language and compiler☆114Updated this week
- Benchmarks to capture important workloads.☆30Updated 2 months ago
- OpenAI Triton backend for Intel® GPUs☆172Updated this week
- Provides the examples to write and build Habana custom kernels using the HabanaTools☆21Updated 4 months ago
- ☆29Updated last year
- A high-throughput and memory-efficient inference and serving engine for LLMs☆62Updated this week
- Applied AI experiments and examples for PyTorch☆251Updated 2 weeks ago
- CUDA Matrix Multiplication Optimization☆177Updated 8 months ago
- ☆26Updated this week