quic / cloud-ai-sdk
Qualcomm Cloud AI SDK (Platform and Apps) enable high performance deep learning inference on Qualcomm Cloud AI platforms delivering high throughput and low latency across Computer Vision, Object Detection, Natural Language Processing and Generative AI models.
☆50Updated 2 weeks ago
Related projects: ⓘ
- ☆66Updated this week
- ☆113Updated last year
- Model Compression Toolkit (MCT) is an open source project for neural network model optimization under efficient, constrained hardware. Th…☆298Updated this week
- Tutorials for running models on First-gen Gaudi and Gaudi2 for Training and Inference. The source files for the tutorials on https://dev…☆47Updated this week
- Supporting PyTorch models with the Google AI Edge TFLite runtime.☆287Updated this week
- TensorRT Model Optimizer is a unified library of state-of-the-art model optimization techniques such as quantization, sparsity, distillat…☆439Updated this week
- ☆92Updated last week
- This repository contains the experimental PyTorch native float8 training UX☆210Updated last month
- CUDA Matrix Multiplication Optimization☆118Updated 2 months ago
- ☆186Updated 2 years ago
- QONNX: Arbitrary-Precision Quantized Neural Networks in ONNX☆121Updated this week
- Applied AI experiments and examples for PyTorch☆123Updated last month
- Notes on quantization in neural networks☆54Updated 9 months ago
- Slides, notes, and materials for the workshop☆297Updated 3 months ago
- PyTorch native quantization and sparsity for training and inference☆748Updated this week
- ☆124Updated 7 months ago
- PyTorch emulation library for Microscaling (MX)-compatible data formats☆143Updated last month
- A Fusion Code Generator for NVIDIA GPUs (commonly known as "nvFuser")☆250Updated this week
- Convert tflite to JSON and make it editable in the IDE. It also converts the edited JSON back to tflite binary.☆26Updated last year
- ☆124Updated last week
- QUICK: Quantization-aware Interleaving and Conflict-free Kernel for efficient LLM inference☆106Updated 6 months ago
- 🚀 Collection of components for development, training, tuning, and inference of foundation models leveraging PyTorch native components.☆144Updated this week
- Advanced Quantization Algorithm for LLMs. This is official implementation of "Optimize Weight Rounding via Signed Gradient Descent for t…☆205Updated this week
- A tutorial introducing knowledge distillation as an optimization technique for deployment on NVIDIA Jetson☆145Updated 10 months ago
- Easy and lightning fast training of 🤗 Transformers on Habana Gaudi processor (HPU)☆144Updated this week
- Easily benchmark PyTorch model FLOPs, latency, throughput, allocated gpu memory and energy consumption☆87Updated last year
- OpenAI Triton backend for Intel® GPUs☆126Updated this week
- List of papers related to Vision Transformers quantization and hardware acceleration in recent AI conferences and journals.☆47Updated 3 months ago
- TAO Toolkit deep learning networks with PyTorch backend☆81Updated 3 weeks ago
- ☆295Updated 9 months ago