HabanaAI / Gaudi-tutorials
Tutorials for running models on First-gen Gaudi and Gaudi2 for Training and Inference. The source files for the tutorials on https://developer.habana.ai/
☆56Updated this week
Alternatives and similar repositories for Gaudi-tutorials:
Users that are interested in Gaudi-tutorials are comparing it to the libraries listed below
- Reference models for Intel(R) Gaudi(R) AI Accelerator☆159Updated last week
- Intel Gaudi's Megatron DeepSpeed Large Language Models for training☆13Updated last month
- Large Language Model Text Generation Inference on Habana Gaudi☆31Updated this week
- A high-throughput and memory-efficient inference and serving engine for LLMs☆50Updated this week
- Collection of kernels written in Triton language☆91Updated 3 months ago
- Easy and lightning fast training of 🤗 Transformers on Habana Gaudi processor (HPU)☆166Updated this week
- QUICK: Quantization-aware Interleaving and Conflict-free Kernel for efficient LLM inference☆114Updated 10 months ago
- Flash-LLM: Enabling Cost-Effective and Highly-Efficient Large Generative Model Inference with Unstructured Sparsity☆195Updated last year
- Machine Learning Agility (MLAgility) benchmark and benchmarking tools☆38Updated last month
- [ICML 2024 Oral] Any-Precision LLM: Low-Cost Deployment of Multiple, Different-Sized LLMs☆93Updated last month
- CUTLASS and CuTe Examples☆37Updated 3 weeks ago
- Fast Hadamard transform in CUDA, with a PyTorch interface☆135Updated 8 months ago
- oneCCL Bindings for Pytorch*☆87Updated 3 weeks ago
- Memory Optimizations for Deep Learning (ICML 2023)☆62Updated 10 months ago
- OpenAI Triton backend for Intel® GPUs☆157Updated this week
- Cataloging released Triton kernels.☆157Updated 3 weeks ago
- Intel® Extension for DeepSpeed* is an extension to DeepSpeed that brings feature support with SYCL kernels on Intel GPU(XPU) device. Note…☆58Updated last month
- Notes on quantization in neural networks☆66Updated last year
- DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.☆12Updated last month
- Provides the examples to write and build Habana custom kernels using the HabanaTools☆19Updated last month
- 🚀 Collection of components for development, training, tuning, and inference of foundation models leveraging PyTorch native components.☆186Updated last week
- ☆132Updated last year
- [ICLR'25] Fast Inference of MoE Models with CPU-GPU Orchestration☆180Updated 2 months ago
- Get down and dirty with FlashAttention2.0 in pytorch, plug in and play no complex CUDA kernels☆102Updated last year
- ShiftAddLLM: Accelerating Pretrained LLMs via Post-Training Multiplication-Less Reparameterization☆101Updated 3 months ago
- Fast Matrix Multiplications for Lookup Table-Quantized LLMs☆221Updated last week
- Qualcomm Cloud AI SDK (Platform and Apps) enable high performance deep learning inference on Qualcomm Cloud AI platforms delivering high …☆55Updated 3 months ago
- ☆34Updated this week
- Applied AI experiments and examples for PyTorch☆216Updated last week
- An efficient GPU support for LLM inference with x-bit quantization (e.g. FP6,FP5).☆232Updated 3 months ago