quic / cloud-ai-sdk
Qualcomm Cloud AI SDK (Platform and Apps) enable high performance deep learning inference on Qualcomm Cloud AI platforms delivering high throughput and low latency across Computer Vision, Object Detection, Natural Language Processing and Generative AI models.
☆58Updated 5 months ago
Alternatives and similar repositories for cloud-ai-sdk:
Users that are interested in cloud-ai-sdk are comparing it to the libraries listed below
- This library empowers users to seamlessly port pretrained models and checkpoints on the HuggingFace (HF) hub (developed using HF transfor…☆62Updated this week
- Tutorials for running models on First-gen Gaudi and Gaudi2 for Training and Inference. The source files for the tutorials on https://dev…☆59Updated last week
- CUDA Matrix Multiplication Optimization☆179Updated 9 months ago
- Triton CLI is an open source command line interface that enables users to create, deploy, and profile models served by the Triton Inferen…☆62Updated last month
- ☆94Updated 7 months ago
- Intel® Extension for DeepSpeed* is an extension to DeepSpeed that brings feature support with SYCL kernels on Intel GPU(XPU) device. Note…☆62Updated last month
- OpenVINO backend for Triton.☆31Updated this week
- Notes on quantization in neural networks☆79Updated last year
- The Triton backend for the ONNX Runtime.☆140Updated this week
- Some CUDA example code with READMEs.☆94Updated last month
- A Fusion Code Generator for NVIDIA GPUs (commonly known as "nvFuser")☆318Updated this week
- SandLogic Lexicons☆18Updated 6 months ago
- Reference models for Intel(R) Gaudi(R) AI Accelerator☆162Updated 2 weeks ago
- Slides, notes, and materials for the workshop☆324Updated 10 months ago
- Model compression for ONNX☆91Updated 5 months ago
- ☆62Updated 5 months ago
- AI Edge Quantizer: flexible post training quantization for LiteRT models.☆32Updated this week
- The Triton backend for TensorRT.☆71Updated this week
- QONNX: Arbitrary-Precision Quantized Neural Networks in ONNX☆145Updated this week
- Triton Model Analyzer is a CLI tool to help with better understanding of the compute and memory requirements of the Triton Inference Serv…☆471Updated this week
- oneCCL Bindings for Pytorch*☆94Updated last week
- ☆38Updated this week
- ☆155Updated 3 months ago
- Training MLP on MNIST in 1.5 seconds with pure CUDA☆44Updated 5 months ago
- Home for OctoML PyTorch Profiler☆112Updated last year
- ☆131Updated last month
- Fast low-bit matmul kernels in Triton☆291Updated this week
- NVIDIA DLA-SW, the recipes and tools for running deep learning workloads on NVIDIA DLA cores for inference applications.☆193Updated 10 months ago
- The Triton backend for the PyTorch TorchScript models.☆146Updated this week
- Cataloging released Triton kernels.☆217Updated 3 months ago