quic / cloud-ai-sdkLinks
Qualcomm Cloud AI SDK (Platform and Apps) enable high performance deep learning inference on Qualcomm Cloud AI platforms delivering high throughput and low latency across Computer Vision, Object Detection, Natural Language Processing and Generative AI models.
☆66Updated last month
Alternatives and similar repositories for cloud-ai-sdk
Users that are interested in cloud-ai-sdk are comparing it to the libraries listed below
Sorting:
- This library empowers users to seamlessly port pretrained models and checkpoints on the HuggingFace (HF) hub (developed using HF transfor…☆75Updated last week
- AI Edge Quantizer: flexible post training quantization for LiteRT models.☆60Updated last week
- Notes on quantization in neural networks☆97Updated last year
- ONNX Script enables developers to naturally author ONNX functions and models using a subset of Python.☆377Updated last week
- Model Compression Toolkit (MCT) is an open source project for neural network model optimization under efficient, constrained hardware. Th…☆414Updated last month
- Triton CLI is an open source command line interface that enables users to create, deploy, and profile models served by the Triton Inferen…☆67Updated last month
- The Triton backend for the ONNX Runtime.☆159Updated last week
- This repository contains tutorials and examples for Triton Inference Server☆763Updated last month
- Slides, notes, and materials for the workshop☆331Updated last year
- The Qualcomm® AI Hub Models are a collection of state-of-the-art machine learning models optimized for performance (latency, memory etc.)…☆777Updated last week
- Tutorials for running models on First-gen Gaudi and Gaudi2 for Training and Inference. The source files for the tutorials on https://dev…☆61Updated last week
- The Triton backend for TensorRT.☆77Updated last month
- ☆194Updated 8 months ago
- 🤗 Optimum Intel: Accelerate inference with Intel optimization tools☆486Updated last week
- Easy and lightning fast training of 🤗 Transformers on Habana Gaudi processor (HPU)☆193Updated last week
- Common utilities for ONNX converters☆277Updated last week
- A unified library of state-of-the-art model optimization techniques like quantization, pruning, distillation, speculative decoding, etc. …☆1,200Updated this week
- A Toolkit to Help Optimize Onnx Model☆205Updated last week
- Supporting PyTorch models with the Google AI Edge TFLite runtime.☆763Updated this week
- Common source, scripts and utilities for creating Triton backends.☆345Updated last month
- The Triton backend for the PyTorch TorchScript models.☆159Updated last month
- This repository is a read-only mirror of https://gitlab.arm.com/kleidi/kleidiai☆74Updated this week
- Some CUDA example code with READMEs.☆170Updated 6 months ago
- Triton Model Analyzer is a CLI tool to help with better understanding of the compute and memory requirements of the Triton Inference Serv…☆485Updated last month
- A repository to unravel the language of GPUs, making their kernel conversations easy to understand☆192Updated 3 months ago
- Triton Model Navigator is an inference toolkit designed for optimizing and deploying Deep Learning models with a focus on NVIDIA GPUs.☆209Updated 4 months ago
- 🏋️ A unified multi-backend utility for benchmarking Transformers, Timm, PEFT, Diffusers and Sentence-Transformers with full support of O…☆310Updated 2 weeks ago
- OpenVINO backend for Triton.☆33Updated last week
- onnxruntime-extensions: A specialized pre- and post- processing library for ONNX Runtime☆409Updated last week
- A plugin for Jupyter Notebook to run CUDA C/C++ code☆240Updated 11 months ago