quic / cloud-ai-sdkLinks

Qualcomm Cloud AI SDK (Platform and Apps) enable high performance deep learning inference on Qualcomm Cloud AI platforms delivering high throughput and low latency across Computer Vision, Object Detection, Natural Language Processing and Generative AI models.

☆67

Alternatives and similar repositories for cloud-ai-sdk

Users that are interested in cloud-ai-sdk are comparing it to the libraries listed below

Sorting:

hkproj / quantization-notes
Notes on quantization in neural networks
☆105Updated last year
quic / efficient-transformers
This library empowers users to seamlessly port pretrained models and checkpoints on the HuggingFace (HF) hub (developed using HF transfor…
☆83Updated this week
mlops-discord / gpu-optimization-workshop
Slides, notes, and materials for the workshop
☆334Updated last year
HabanaAI / Gaudi-tutorials
Tutorials for running models on First-gen Gaudi and Gaudi2 for Training and Inference. The source files for the tutorials on https://dev…
☆62Updated 2 months ago
huggingface / optimum-intel
🤗 Optimum Intel: Accelerate inference with Intel optimization tools
☆507Updated this week
triton-inference-server / tutorials
This repository contains tutorials and examples for Triton Inference Server
☆800Updated this week
google-ai-edge / ai-edge-quantizer
AI Edge Quantizer: flexible post training quantization for LiteRT models.
☆76Updated this week
google-ai-edge / ai-edge-torch
Supporting PyTorch models with the Google AI Edge TFLite runtime.
☆828Updated last week
SonySemiconductorSolutions / mct-model-optimization
Model Compression Toolkit (MCT) is an open source project for neural network model optimization under efficient, constrained hardware. Th…
☆425Updated this week
NVIDIA / TensorRT-Model-Optimizer
A unified library of state-of-the-art model optimization techniques like quantization, pruning, distillation, speculative decoding, etc. …
☆1,529Updated this week
quic / ai-hub-models
The Qualcomm® AI Hub Models are a collection of state-of-the-art machine learning models optimized for performance (latency, memory etc.)…
☆839Updated this week
huggingface / optimum-habana
Easy and lightning fast training of 🤗 Transformers on Habana Gaudi processor (HPU)
☆201Updated last week
huggingface / optimum-benchmark
🏋️ A unified multi-backend utility for benchmarking Transformers, Timm, PEFT, Diffusers and Sentence-Transformers with full support of O…
☆317Updated last month
cfregly / ai-performance-engineering
☆558Updated this week
drkennetz / cuda_examples
Some CUDA example code with READMEs.
☆178Updated last week
microsoft / onnxscript
ONNX Script enables developers to naturally author ONNX functions and models using a subset of Python.
☆408Updated this week
triton-inference-server / triton_cli
Triton CLI is an open source command line interface that enables users to create, deploy, and profile models served by the Triton Inferen…
☆71Updated this week
microsoft / onnxconverter-common
Common utilities for ONNX converters
☆284Updated 2 months ago
openvinotoolkit / openvino.genai
Run Generative AI models with simple C++/Python API and using OpenVINO Runtime
☆371Updated this week
PacktPublishing / Accelerate-Model-Training-with-PyTorch-2.X
Accelerate Model Training with PyTorch 2.X, published by Packt
☆47Updated last week
huggingface / optimum-quanto
A pytorch quantization backend for optimum
☆1,009Updated 3 weeks ago
sandlogic / SandLogic-Lexicons
SandLogic Lexicons
☆19Updated 2 months ago
intel / neural-speed
An innovative library for efficient LLM inference via low-bit quantization
☆349Updated last year
triton-inference-server / perf_analyzer
☆118Updated this week
intel / auto-round
Advanced quantization toolkit for LLMs. Native support for WOQ, MXFP4, NVFP4, GGUF, Adaptive Bits and seamless integration with Transform…
☆712Updated this week
neuralmagic / nm-vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
☆266Updated last year
triton-inference-server / onnxruntime_backend
The Triton backend for the ONNX Runtime.
☆166Updated last week
triton-inference-server / vllm_backend
☆312Updated this week
huggingface / optimum-executorch
🤗 Optimum ExecuTorch
☆80Updated last week
triton-inference-server / model_analyzer
Triton Model Analyzer is a CLI tool to help with better understanding of the compute and memory requirements of the Triton Inference Serv…
☆496Updated this week