quic / cloud-ai-sdk
Qualcomm Cloud AI SDK (Platform and Apps) enable high performance deep learning inference on Qualcomm Cloud AI platforms delivering high throughput and low latency across Computer Vision, Object Detection, Natural Language Processing and Generative AI models.
☆60Updated 6 months ago
Alternatives and similar repositories for cloud-ai-sdk
Users that are interested in cloud-ai-sdk are comparing it to the libraries listed below
Sorting:
- This library empowers users to seamlessly port pretrained models and checkpoints on the HuggingFace (HF) hub (developed using HF transfor…☆65Updated this week
- Notes on quantization in neural networks☆82Updated last year
- Slides, notes, and materials for the workshop☆325Updated 11 months ago
- ☆146Updated 2 years ago
- Triton CLI is an open source command line interface that enables users to create, deploy, and profile models served by the Triton Inferen…☆62Updated 2 weeks ago
- The Triton backend for the ONNX Runtime.☆144Updated this week
- Visualize ONNX models with model-explorer☆33Updated 2 months ago
- ☆94Updated 7 months ago
- Training MLP on MNIST in 1.5 seconds with pure CUDA☆46Updated 6 months ago
- Model compression for ONNX☆92Updated 5 months ago
- Tutorials for running models on First-gen Gaudi and Gaudi2 for Training and Inference. The source files for the tutorials on https://dev…☆59Updated this week
- CUDA Matrix Multiplication Optimization☆186Updated 9 months ago
- The Triton backend for the PyTorch TorchScript models.☆149Updated this week
- ☆163Updated 4 months ago
- The Triton backend for TensorRT.☆74Updated this week
- ☆204Updated 3 years ago
- Model Compression Toolkit (MCT) is an open source project for neural network model optimization under efficient, constrained hardware. Th…☆394Updated this week
- Some CUDA example code with READMEs.☆98Updated 2 months ago
- Fast low-bit matmul kernels in Triton☆299Updated this week
- A curated collection of resources, tutorials, and best practices for learning and mastering NVIDIA CUTLASS☆171Updated last week
- ☆155Updated last year
- A Fusion Code Generator for NVIDIA GPUs (commonly known as "nvFuser")☆324Updated this week
- SandLogic Lexicons☆19Updated 6 months ago
- Triton Model Analyzer is a CLI tool to help with better understanding of the compute and memory requirements of the Triton Inference Serv…☆474Updated 3 weeks ago
- Fast Hadamard transform in CUDA, with a PyTorch interface☆185Updated 11 months ago
- QONNX: Arbitrary-Precision Quantized Neural Networks in ONNX☆148Updated this week
- NVIDIA DLA-SW, the recipes and tools for running deep learning workloads on NVIDIA DLA cores for inference applications.☆195Updated 11 months ago
- nvidia-modelopt is a unified library of state-of-the-art model optimization techniques like quantization, pruning, distillation, speculat…☆909Updated this week
- Applied AI experiments and examples for PyTorch☆265Updated 2 weeks ago
- NVIDIA curated collection of educational resources related to general purpose GPU programming.☆437Updated 2 weeks ago