quic / cloud-ai-sdkLinks
Qualcomm Cloud AI SDK (Platform and Apps) enable high performance deep learning inference on Qualcomm Cloud AI platforms delivering high throughput and low latency across Computer Vision, Object Detection, Natural Language Processing and Generative AI models.
☆62Updated 2 months ago
Alternatives and similar repositories for cloud-ai-sdk
Users that are interested in cloud-ai-sdk are comparing it to the libraries listed below
Sorting:
- This library empowers users to seamlessly port pretrained models and checkpoints on the HuggingFace (HF) hub (developed using HF transfor…☆74Updated this week
- Tutorials for running models on First-gen Gaudi and Gaudi2 for Training and Inference. The source files for the tutorials on https://dev…☆62Updated last week
- Notes on quantization in neural networks☆90Updated last year
- ONNX Script enables developers to naturally author ONNX functions and models using a subset of Python.☆366Updated this week
- 🤗 Optimum Intel: Accelerate inference with Intel optimization tools☆480Updated this week
- The Qualcomm® AI Hub Models are a collection of state-of-the-art machine learning models optimized for performance (latency, memory etc.)…☆745Updated last week
- Slides, notes, and materials for the workshop☆327Updated last year
- AI Edge Quantizer: flexible post training quantization for LiteRT models.☆54Updated this week
- This repository contains tutorials and examples for Triton Inference Server☆736Updated this week
- A unified library of state-of-the-art model optimization techniques like quantization, pruning, distillation, speculative decoding, etc. …☆1,068Updated last week
- Model Compression Toolkit (MCT) is an open source project for neural network model optimization under efficient, constrained hardware. Th…☆407Updated 2 weeks ago
- Run Generative AI models with simple C++/Python API and using OpenVINO Runtime☆310Updated this week
- Easy and lightning fast training of 🤗 Transformers on Habana Gaudi processor (HPU)☆190Updated this week
- The Triton backend for TensorRT.☆77Updated this week
- The Triton backend for the ONNX Runtime.☆156Updated this week
- Some CUDA example code with READMEs.☆169Updated 4 months ago
- Advanced Quantization Algorithm for LLMs and VLMs, with support for CPU, Intel GPU, CUDA and HPU. Seamlessly integrated with Torchao, Tra…☆531Updated this week
- This repository is a read-only mirror of https://gitlab.arm.com/kleidi/kleidiai☆59Updated this week
- Model compression for ONNX☆96Updated 8 months ago
- ☆162Updated last year
- A plugin for Jupyter Notebook to run CUDA C/C++ code☆237Updated 10 months ago
- A curated collection of resources, tutorials, and best practices for learning and mastering NVIDIA CUTLASS☆202Updated 2 months ago
- Training MLP on MNIST in 1.5 seconds with pure CUDA☆47Updated 8 months ago
- Easily benchmark PyTorch model FLOPs, latency, throughput, allocated gpu memory and energy consumption☆103Updated last year
- An innovative library for efficient LLM inference via low-bit quantization☆349Updated 10 months ago
- The Triton backend for the PyTorch TorchScript models.☆157Updated last week
- 🏋️ A unified multi-backend utility for benchmarking Transformers, Timm, PEFT, Diffusers and Sentence-Transformers with full support of O…☆307Updated last month
- A pytorch quantization backend for optimum☆971Updated 3 weeks ago
- A Toolkit to Help Optimize Onnx Model☆181Updated this week
- ☆99Updated 10 months ago