HabanaAI / Gaudi-tutorials
Tutorials for running models on First-gen Gaudi and Gaudi2 for Training and Inference. The source files for the tutorials on https://developer.habana.ai/
☆47Updated this week
Related projects: ⓘ
- Reference models for Intel(R) Gaudi(R) AI Accelerator☆152Updated this week
- ☆66Updated this week
- Easy and lightning fast training of 🤗 Transformers on Habana Gaudi processor (HPU)☆144Updated this week
- A high-throughput and memory-efficient inference and serving engine for LLMs☆34Updated this week
- QUICK: Quantization-aware Interleaving and Conflict-free Kernel for efficient LLM inference☆106Updated 6 months ago
- Large Language Model Text Generation Inference on Habana Gaudi☆24Updated last week
- Flash-LLM: Enabling Cost-Effective and Highly-Efficient Large Generative Model Inference with Unstructured Sparsity☆166Updated 11 months ago
- DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.☆11Updated 3 weeks ago
- Collection of kernels written in Triton language☆48Updated 2 weeks ago
- Notes on quantization in neural networks☆54Updated 9 months ago
- Code for the AAAI 2024 Oral paper "OWQ: Outlier-Aware Weight Quantization for Efficient Fine-Tuning and Inference of Large Language Model…☆50Updated 6 months ago
- Qualcomm Cloud AI SDK (Platform and Apps) enable high performance deep learning inference on Qualcomm Cloud AI platforms delivering high …☆50Updated 2 weeks ago
- Advanced Quantization Algorithm for LLMs. This is official implementation of "Optimize Weight Rounding via Signed Gradient Descent for t…☆205Updated this week
- [ICML 2024 Oral] Any-Precision LLM: Low-Cost Deployment of Multiple, Different-Sized LLMs☆72Updated last month
- An experimentation platform for LLM inference optimisation☆22Updated last week
- Fast Hadamard transform in CUDA, with a PyTorch interface☆87Updated 3 months ago
- PyTorch extension for emulating FP8 data formats on standard FP32 Xeon/GPU hardware.☆98Updated 9 months ago
- Intel® Extension for DeepSpeed* is an extension to DeepSpeed that brings feature support with SYCL kernels on Intel GPU(XPU) device. Note…☆56Updated 3 weeks ago
- Applied AI experiments and examples for PyTorch☆123Updated last month
- 🏋️ A unified multi-backend utility for benchmarking Transformers, Timm, PEFT, Diffusers and Sentence-Transformers with full support of O…☆231Updated last week
- [MLSys'24] Atom: Low-bit Quantization for Efficient and Accurate LLM Serving☆258Updated 2 months ago
- 🚀 Collection of components for development, training, tuning, and inference of foundation models leveraging PyTorch native components.☆144Updated this week
- Intel Gaudi's Megatron DeepSpeed Large Language Models for training☆12Updated 3 weeks ago
- Code for the NeurIPS 2022 paper "Optimal Brain Compression: A Framework for Accurate Post-Training Quantization and Pruning".☆95Updated last year
- An efficient GPU support for LLM inference with x-bit quantization (e.g. FP6,FP5).☆173Updated 3 months ago
- ☆113Updated last year
- KIVI: A Tuning-Free Asymmetric 2bit Quantization for KV Cache☆213Updated 3 weeks ago
- This repository contains the experimental PyTorch native float8 training UX☆210Updated last month
- ☆15Updated 3 months ago
- Machine Learning Agility (MLAgility) benchmark and benchmarking tools☆37Updated last month