quic / cloud-ai-sdkLinks
Qualcomm Cloud AI SDK (Platform and Apps) enable high performance deep learning inference on Qualcomm Cloud AI platforms delivering high throughput and low latency across Computer Vision, Object Detection, Natural Language Processing and Generative AI models.
☆70Updated this week
Alternatives and similar repositories for cloud-ai-sdk
Users that are interested in cloud-ai-sdk are comparing it to the libraries listed below
Sorting:
- This library empowers users to seamlessly port pretrained models and checkpoints on the HuggingFace (HF) hub (developed using HF transfor…☆84Updated this week
- Model Compression Toolkit (MCT) is an open source project for neural network model optimization under efficient, constrained hardware. Th…☆427Updated this week
- Notes on quantization in neural networks☆111Updated last year
- AI Edge Quantizer: flexible post training quantization for LiteRT models.☆81Updated 3 weeks ago
- ☆123Updated 3 weeks ago
- Slides, notes, and materials for the workshop☆335Updated last year
- The Triton backend for the ONNX Runtime.☆168Updated last week
- Inference Vision Transformer (ViT) in plain C/C++ with ggml☆300Updated last year
- Supporting PyTorch models with the Google AI Edge TFLite runtime.☆855Updated this week
- ONNX Script enables developers to naturally author ONNX functions and models using a subset of Python.☆412Updated this week
- This repository contains tutorials and examples for Triton Inference Server☆810Updated 3 weeks ago
- Common utilities for ONNX converters☆288Updated 3 months ago
- A high-throughput and memory-efficient inference and serving engine for LLMs☆267Updated last year
- Triton CLI is an open source command line interface that enables users to create, deploy, and profile models served by the Triton Inferen…☆72Updated 3 weeks ago
- ☆168Updated 3 weeks ago
- The Qualcomm® AI Hub Models are a collection of state-of-the-art machine learning models optimized for performance (latency, memory etc.)…☆855Updated last week
- QONNX: Arbitrary-Precision Quantized Neural Networks in ONNX☆166Updated this week
- The Triton backend for TensorRT.☆80Updated 3 weeks ago
- Tutorials for running models on First-gen Gaudi and Gaudi2 for Training and Inference. The source files for the tutorials on https://dev…☆62Updated 2 months ago
- making the official triton tutorials actually comprehensible☆78Updated 3 months ago
- An implementation of the transformer architecture onto an Nvidia CUDA kernel☆196Updated 2 years ago
- Fast low-bit matmul kernels in Triton☆402Updated 2 weeks ago
- A Toolkit to Help Optimize Onnx Model☆267Updated last week
- Model compression for ONNX☆99Updated last year
- 🏋️ A unified multi-backend utility for benchmarking Transformers, Timm, PEFT, Diffusers and Sentence-Transformers with full support of O…☆320Updated 2 months ago
- This repository is a read-only mirror of https://gitlab.arm.com/kleidi/kleidiai☆104Updated this week
- A safetensors extension to efficiently store sparse quantized tensors on disk☆214Updated this week
- Common source, scripts and utilities for creating Triton backends.☆361Updated 3 weeks ago
- ☆712Updated this week
- A unified library of SOTA model optimization techniques like quantization, pruning, distillation, speculative decoding, etc. It compresse…☆1,605Updated this week