quic / cloud-ai-sdkLinks
Qualcomm Cloud AI SDK (Platform and Apps) enable high performance deep learning inference on Qualcomm Cloud AI platforms delivering high throughput and low latency across Computer Vision, Object Detection, Natural Language Processing and Generative AI models.
☆64Updated last week
Alternatives and similar repositories for cloud-ai-sdk
Users that are interested in cloud-ai-sdk are comparing it to the libraries listed below
Sorting:
- This library empowers users to seamlessly port pretrained models and checkpoints on the HuggingFace (HF) hub (developed using HF transfor…☆74Updated this week
- Tutorials for running models on First-gen Gaudi and Gaudi2 for Training and Inference. The source files for the tutorials on https://dev…☆61Updated last week
- AI Edge Quantizer: flexible post training quantization for LiteRT models.☆58Updated last week
- Model Compression Toolkit (MCT) is an open source project for neural network model optimization under efficient, constrained hardware. Th…☆410Updated last month
- A unified library of state-of-the-art model optimization techniques like quantization, pruning, distillation, speculative decoding, etc. …☆1,103Updated this week
- Notes on quantization in neural networks☆96Updated last year
- Slides, notes, and materials for the workshop☆329Updated last year
- This repository contains tutorials and examples for Triton Inference Server☆755Updated last week
- Triton CLI is an open source command line interface that enables users to create, deploy, and profile models served by the Triton Inferen…☆66Updated last week
- The Triton backend for TensorRT.☆78Updated last week
- 🤗 Optimum Intel: Accelerate inference with Intel optimization tools☆482Updated this week
- Supporting PyTorch models with the Google AI Edge TFLite runtime.☆735Updated last week
- ONNX Script enables developers to naturally author ONNX functions and models using a subset of Python.☆369Updated this week
- Training MLP on MNIST in 1.5 seconds with pure CUDA☆46Updated 9 months ago
- The Qualcomm® AI Hub Models are a collection of state-of-the-art machine learning models optimized for performance (latency, memory etc.)…☆766Updated this week
- Easy and lightning fast training of 🤗 Transformers on Habana Gaudi processor (HPU)☆192Updated this week
- 🏋️ A unified multi-backend utility for benchmarking Transformers, Timm, PEFT, Diffusers and Sentence-Transformers with full support of O…☆308Updated 2 months ago
- Fast low-bit matmul kernels in Triton☆341Updated last week
- The Triton backend for the PyTorch TorchScript models.☆158Updated last week
- A repository dedicated to evaluating the performance of quantizied LLaMA3 using various quantization methods..☆193Updated 7 months ago
- A plugin for Jupyter Notebook to run CUDA C/C++ code☆238Updated 11 months ago
- ☆99Updated 11 months ago
- Triton Model Analyzer is a CLI tool to help with better understanding of the compute and memory requirements of the Triton Inference Serv…☆483Updated last week
- A Fusion Code Generator for NVIDIA GPUs (commonly known as "nvFuser")☆347Updated this week
- The Triton backend for the ONNX Runtime.☆157Updated last week
- ☆102Updated this week
- Inference Vision Transformer (ViT) in plain C/C++ with ggml☆291Updated last year
- A tutorial introducing knowledge distillation as an optimization technique for deployment on NVIDIA Jetson☆207Updated last year
- Run Generative AI models with simple C++/Python API and using OpenVINO Runtime☆319Updated this week
- A pytorch quantization backend for optimum☆981Updated last month