quic / cloud-ai-sdkLinks
Qualcomm Cloud AI SDK (Platform and Apps) enable high performance deep learning inference on Qualcomm Cloud AI platforms delivering high throughput and low latency across Computer Vision, Object Detection, Natural Language Processing and Generative AI models.
☆67Updated 3 months ago
Alternatives and similar repositories for cloud-ai-sdk
Users that are interested in cloud-ai-sdk are comparing it to the libraries listed below
Sorting:
- Notes on quantization in neural networks☆105Updated last year
- This library empowers users to seamlessly port pretrained models and checkpoints on the HuggingFace (HF) hub (developed using HF transfor…☆83Updated this week
- Slides, notes, and materials for the workshop☆334Updated last year
- Tutorials for running models on First-gen Gaudi and Gaudi2 for Training and Inference. The source files for the tutorials on https://dev…☆62Updated 2 months ago
- 🤗 Optimum Intel: Accelerate inference with Intel optimization tools☆507Updated this week
- This repository contains tutorials and examples for Triton Inference Server☆800Updated this week
- AI Edge Quantizer: flexible post training quantization for LiteRT models.☆76Updated this week
- Supporting PyTorch models with the Google AI Edge TFLite runtime.☆828Updated last week
- Model Compression Toolkit (MCT) is an open source project for neural network model optimization under efficient, constrained hardware. Th…☆425Updated this week
- A unified library of state-of-the-art model optimization techniques like quantization, pruning, distillation, speculative decoding, etc. …☆1,529Updated this week
- The Qualcomm® AI Hub Models are a collection of state-of-the-art machine learning models optimized for performance (latency, memory etc.)…☆839Updated this week
- Easy and lightning fast training of 🤗 Transformers on Habana Gaudi processor (HPU)☆201Updated last week
- 🏋️ A unified multi-backend utility for benchmarking Transformers, Timm, PEFT, Diffusers and Sentence-Transformers with full support of O…☆317Updated last month
- ☆558Updated this week
- Some CUDA example code with READMEs.☆178Updated last week
- ONNX Script enables developers to naturally author ONNX functions and models using a subset of Python.☆408Updated this week
- Triton CLI is an open source command line interface that enables users to create, deploy, and profile models served by the Triton Inferen…☆71Updated this week
- Common utilities for ONNX converters☆284Updated 2 months ago
- Run Generative AI models with simple C++/Python API and using OpenVINO Runtime☆371Updated this week
- Accelerate Model Training with PyTorch 2.X, published by Packt☆47Updated last week
- A pytorch quantization backend for optimum☆1,009Updated 3 weeks ago
- SandLogic Lexicons☆19Updated 2 months ago
- An innovative library for efficient LLM inference via low-bit quantization☆349Updated last year
- ☆118Updated this week
- Advanced quantization toolkit for LLMs. Native support for WOQ, MXFP4, NVFP4, GGUF, Adaptive Bits and seamless integration with Transform…☆712Updated this week
- A high-throughput and memory-efficient inference and serving engine for LLMs☆266Updated last year
- The Triton backend for the ONNX Runtime.☆166Updated last week
- ☆312Updated this week
- 🤗 Optimum ExecuTorch☆80Updated last week
- Triton Model Analyzer is a CLI tool to help with better understanding of the compute and memory requirements of the Triton Inference Serv…☆496Updated this week