quic / cloud-ai-sdkLinks
Qualcomm Cloud AI SDK (Platform and Apps) enable high performance deep learning inference on Qualcomm Cloud AI platforms delivering high throughput and low latency across Computer Vision, Object Detection, Natural Language Processing and Generative AI models.
☆67Updated 2 months ago
Alternatives and similar repositories for cloud-ai-sdk
Users that are interested in cloud-ai-sdk are comparing it to the libraries listed below
Sorting:
- This library empowers users to seamlessly port pretrained models and checkpoints on the HuggingFace (HF) hub (developed using HF transfor…☆80Updated this week
- Notes on quantization in neural networks☆103Updated last year
- This repository contains tutorials and examples for Triton Inference Server☆782Updated last week
- A unified library of state-of-the-art model optimization techniques like quantization, pruning, distillation, speculative decoding, etc. …☆1,422Updated this week
- 🤗 Optimum Intel: Accelerate inference with Intel optimization tools☆498Updated this week
- Slides, notes, and materials for the workshop☆331Updated last year
- The Qualcomm® AI Hub Models are a collection of state-of-the-art machine learning models optimized for performance (latency, memory etc.)…☆798Updated this week
- Triton CLI is an open source command line interface that enables users to create, deploy, and profile models served by the Triton Inferen…☆68Updated 3 weeks ago
- Model Compression Toolkit (MCT) is an open source project for neural network model optimization under efficient, constrained hardware. Th…☆418Updated last week
- Pre-built components and code samples to help you build and deploy production-grade AI applications with the OpenVINO™ Toolkit from Intel☆174Updated this week
- Tutorials for running models on First-gen Gaudi and Gaudi2 for Training and Inference. The source files for the tutorials on https://dev…☆61Updated 2 weeks ago
- The Triton backend for the ONNX Runtime.☆162Updated last week
- ONNX Script enables developers to naturally author ONNX functions and models using a subset of Python.☆400Updated this week
- Run Generative AI models with simple C++/Python API and using OpenVINO Runtime☆345Updated this week
- Supporting PyTorch models with the Google AI Edge TFLite runtime.☆793Updated this week
- Easy and lightning fast training of 🤗 Transformers on Habana Gaudi processor (HPU)☆198Updated this week
- The Triton backend for TensorRT.☆78Updated 3 weeks ago
- A tutorial introducing knowledge distillation as an optimization technique for deployment on NVIDIA Jetson☆213Updated last year
- Advanced Quantization Algorithm for LLMs and VLMs, with support for CPU, Intel GPU, CUDA and HPU.☆647Updated last week
- cudnn_frontend provides a c++ wrapper for the cudnn backend API and samples on how to use it☆622Updated 2 weeks ago
- A plugin for Jupyter Notebook to run CUDA C/C++ code☆245Updated last year
- Common utilities for ONNX converters☆281Updated last month
- An innovative library for efficient LLM inference via low-bit quantization☆350Updated last year
- 🏋️ A unified multi-backend utility for benchmarking Transformers, Timm, PEFT, Diffusers and Sentence-Transformers with full support of O…☆315Updated last week
- AI Edge Quantizer: flexible post training quantization for LiteRT models.☆68Updated this week
- NVIDIA tools guide☆142Updated 9 months ago
- Triton Model Analyzer is a CLI tool to help with better understanding of the compute and memory requirements of the Triton Inference Serv…☆491Updated 3 weeks ago
- The Triton backend for the PyTorch TorchScript models.☆159Updated last week
- ☆203Updated 9 months ago
- Some CUDA example code with READMEs.☆174Updated 7 months ago