quic / cloud-ai-sdkLinks
Qualcomm Cloud AI SDK (Platform and Apps) enable high performance deep learning inference on Qualcomm Cloud AI platforms delivering high throughput and low latency across Computer Vision, Object Detection, Natural Language Processing and Generative AI models.
☆70Updated 3 weeks ago
Alternatives and similar repositories for cloud-ai-sdk
Users that are interested in cloud-ai-sdk are comparing it to the libraries listed below
Sorting:
- This library empowers users to seamlessly port pretrained models and checkpoints on the HuggingFace (HF) hub (developed using HF transfor…☆85Updated this week
- Notes on quantization in neural networks☆114Updated 2 years ago
- Model Compression Toolkit (MCT) is an open source project for neural network model optimization under efficient, constrained hardware. Th…☆428Updated last week
- This repository contains tutorials and examples for Triton Inference Server☆813Updated 3 weeks ago
- AI Edge Quantizer: flexible post training quantization for LiteRT models.☆84Updated last week
- Triton CLI is an open source command line interface that enables users to create, deploy, and profile models served by the Triton Inferen…☆73Updated 3 weeks ago
- Qualcomm® AI Hub Models is our collection of state-of-the-art machine learning models optimized for performance (latency, memory etc.) an…☆873Updated 2 weeks ago
- Supporting PyTorch models with the Google AI Edge TFLite runtime.☆880Updated last week
- Slides, notes, and materials for the workshop☆337Updated last year
- Inference Vision Transformer (ViT) in plain C/C++ with ggml☆304Updated last year
- ONNX Script enables developers to naturally author ONNX functions and models using a subset of Python.☆414Updated this week
- Tutorials for running models on First-gen Gaudi and Gaudi2 for Training and Inference. The source files for the tutorials on https://dev…☆62Updated 3 months ago
- Visualize ONNX models with model-explorer☆66Updated 2 weeks ago
- 🤗 Optimum Intel: Accelerate inference with Intel optimization tools☆522Updated this week
- Some CUDA example code with READMEs.☆179Updated last month
- The Triton backend for the ONNX Runtime.☆170Updated 3 weeks ago
- ☆128Updated 2 weeks ago
- A high-throughput and memory-efficient inference and serving engine for LLMs☆267Updated 3 weeks ago
- An innovative library for efficient LLM inference via low-bit quantization☆351Updated last year
- Easy and lightning fast training of 🤗 Transformers on Habana Gaudi processor (HPU)☆203Updated last week
- Fast low-bit matmul kernels in Triton☆413Updated last week
- Model compression for ONNX☆99Updated last year
- Advanced quantization toolkit for LLMs and VLMs. Support for WOQ, MXFP4, NVFP4, GGUF, Adaptive Schemes and seamless integration with Tra…☆785Updated this week
- making the official triton tutorials actually comprehensible☆80Updated 4 months ago
- 🏋️ A unified multi-backend utility for benchmarking Transformers, Timm, PEFT, Diffusers and Sentence-Transformers with full support of O…☆325Updated 3 months ago
- Common utilities for ONNX converters☆289Updated last week
- QONNX: Arbitrary-Precision Quantized Neural Networks in ONNX☆168Updated this week
- ☆172Updated last week
- ☆855Updated this week
- The Triton backend for TensorRT.☆82Updated 3 weeks ago