quic / cloud-ai-sdkLinks
Qualcomm Cloud AI SDK (Platform and Apps) enable high performance deep learning inference on Qualcomm Cloud AI platforms delivering high throughput and low latency across Computer Vision, Object Detection, Natural Language Processing and Generative AI models.
☆71Updated last month
Alternatives and similar repositories for cloud-ai-sdk
Users that are interested in cloud-ai-sdk are comparing it to the libraries listed below
Sorting:
- This library empowers users to seamlessly port pretrained models and checkpoints on the HuggingFace (HF) hub (developed using HF transfor…☆85Updated this week
- ☆131Updated this week
- Tutorials for running models on First-gen Gaudi and Gaudi2 for Training and Inference. The source files for the tutorials on https://dev…☆63Updated 4 months ago
- Notes on quantization in neural networks☆114Updated 2 years ago
- This repository contains tutorials and examples for Triton Inference Server☆813Updated this week
- Easy and lightning fast training of 🤗 Transformers on Habana Gaudi processor (HPU)☆204Updated last week
- The Triton backend for TensorRT.☆83Updated this week
- Triton CLI is an open source command line interface that enables users to create, deploy, and profile models served by the Triton Inferen…☆73Updated this week
- Triton Model Analyzer is a CLI tool to help with better understanding of the compute and memory requirements of the Triton Inference Serv…☆502Updated this week
- Slides, notes, and materials for the workshop☆338Updated last year
- The Triton backend for the ONNX Runtime.☆171Updated this week
- QONNX: Arbitrary-Precision Quantized Neural Networks in ONNX☆170Updated last week
- 🤗 Optimum Intel: Accelerate inference with Intel optimization tools☆528Updated this week
- Some CUDA example code with READMEs.☆179Updated 2 months ago
- Qualcomm® AI Hub Models is our collection of state-of-the-art machine learning models optimized for performance (latency, memory etc.) an…☆890Updated this week
- AI Edge Quantizer: flexible post training quantization for LiteRT models.☆88Updated this week
- The Triton backend for the PyTorch TorchScript models.☆170Updated this week
- ONNX Script enables developers to naturally author ONNX functions and models using a subset of Python.☆418Updated this week
- ☆324Updated this week
- Model Compression Toolkit (MCT) is an open source project for neural network model optimization under efficient, constrained hardware. Th…☆431Updated this week
- Run Generative AI models with simple C++/Python API and using OpenVINO Runtime☆414Updated this week
- ☆135Updated this week
- Large Language Model Text Generation Inference on Habana Gaudi☆34Updated 9 months ago
- 🏋️ A unified multi-backend utility for benchmarking Transformers, Timm, PEFT, Diffusers and Sentence-Transformers with full support of O…☆326Updated 3 months ago
- OpenVINO backend for Triton.☆36Updated last week
- Supporting PyTorch models with the Google AI Edge TFLite runtime.☆903Updated this week
- A plugin for Jupyter Notebook to run CUDA C/C++ code☆257Updated last year
- Easily benchmark PyTorch model FLOPs, latency, throughput, allocated gpu memory and energy consumption☆109Updated 2 years ago
- Inference Vision Transformer (ViT) in plain C/C++ with ggml☆305Updated last year
- Profile PyTorch models for FLOPs and parameters, helping to evaluate computational efficiency and memory usage.☆115Updated 2 weeks ago