quic / cloud-ai-sdkLinks
Qualcomm Cloud AI SDK (Platform and Apps) enable high performance deep learning inference on Qualcomm Cloud AI platforms delivering high throughput and low latency across Computer Vision, Object Detection, Natural Language Processing and Generative AI models.
☆66Updated 2 months ago
Alternatives and similar repositories for cloud-ai-sdk
Users that are interested in cloud-ai-sdk are comparing it to the libraries listed below
Sorting:
- This library empowers users to seamlessly port pretrained models and checkpoints on the HuggingFace (HF) hub (developed using HF transfor…☆82Updated this week
- Slides, notes, and materials for the workshop☆333Updated last year
- Tutorials for running models on First-gen Gaudi and Gaudi2 for Training and Inference. The source files for the tutorials on https://dev…☆61Updated last month
- Notes on quantization in neural networks☆104Updated last year
- AI Edge Quantizer: flexible post training quantization for LiteRT models.☆72Updated this week
- 🏋️ A unified multi-backend utility for benchmarking Transformers, Timm, PEFT, Diffusers and Sentence-Transformers with full support of O…☆318Updated last month
- 🤗 Optimum Intel: Accelerate inference with Intel optimization tools☆502Updated this week
- This repository contains tutorials and examples for Triton Inference Server☆792Updated 2 weeks ago
- Model Compression Toolkit (MCT) is an open source project for neural network model optimization under efficient, constrained hardware. Th…☆419Updated last week
- The Triton backend for TensorRT.☆79Updated 2 weeks ago
- ONNX Script enables developers to naturally author ONNX functions and models using a subset of Python.☆404Updated this week
- The Triton backend for the PyTorch TorchScript models.☆162Updated last week
- Easy and lightning fast training of 🤗 Transformers on Habana Gaudi processor (HPU)☆199Updated last week
- The Triton backend for the ONNX Runtime.☆162Updated 2 weeks ago
- Triton CLI is an open source command line interface that enables users to create, deploy, and profile models served by the Triton Inferen…☆70Updated 2 weeks ago
- Run Generative AI models with simple C++/Python API and using OpenVINO Runtime☆364Updated this week
- SandLogic Lexicons☆19Updated last month
- Advanced Quantization Algorithm for LLMs and VLMs, with support for CPU, Intel GPU, CUDA and HPU.☆679Updated this week
- A unified library of state-of-the-art model optimization techniques like quantization, pruning, distillation, speculative decoding, etc. …☆1,464Updated this week
- ☆115Updated 2 weeks ago
- An innovative library for efficient LLM inference via low-bit quantization☆349Updated last year
- The Qualcomm® AI Hub Models are a collection of state-of-the-art machine learning models optimized for performance (latency, memory etc.)…☆815Updated 2 weeks ago
- QONNX: Arbitrary-Precision Quantized Neural Networks in ONNX☆161Updated this week
- Inference Vision Transformer (ViT) in plain C/C++ with ggml☆295Updated last year
- Common utilities for ONNX converters☆283Updated last month
- Fast low-bit matmul kernels in Triton☆385Updated last week
- Training MLP on MNIST in 1.5 seconds with pure CUDA☆46Updated 11 months ago
- A high-throughput and memory-efficient inference and serving engine for LLMs☆266Updated last year
- making the official triton tutorials actually comprehensible☆57Updated 2 months ago
- Triton Model Analyzer is a CLI tool to help with better understanding of the compute and memory requirements of the Triton Inference Serv…☆495Updated this week