quic / cloud-ai-sdk
Qualcomm Cloud AI SDK (Platform and Apps) enable high performance deep learning inference on Qualcomm Cloud AI platforms delivering high throughput and low latency across Computer Vision, Object Detection, Natural Language Processing and Generative AI models.
☆55Updated 3 months ago
Alternatives and similar repositories for cloud-ai-sdk:
Users that are interested in cloud-ai-sdk are comparing it to the libraries listed below
- This library empowers users to seamlessly port pretrained models and checkpoints on the HuggingFace (HF) hub (developed using HF transfor…☆57Updated this week
- Slides, notes, and materials for the workshop☆310Updated 7 months ago
- Notes on quantization in neural networks☆66Updated last year
- Tutorials for running models on First-gen Gaudi and Gaudi2 for Training and Inference. The source files for the tutorials on https://dev …☆56Updated this week
- Fast low-bit matmul kernels in Triton☆199Updated last week
- Machine Learning Agility (MLAgility) benchmark and benchmarking tools☆38Updated last month
- TensorRT Model Optimizer is a unified library of state-of-the-art model optimization techniques such as quantization, pruning, distillati…☆679Updated 3 weeks ago
- SandLogic Lexicons☆17Updated 3 months ago
- A block oriented training approach for inference time optimization.☆32Updated 5 months ago
- ☆132Updated last year
- ☆312Updated last year
- ☆120Updated last month
- [EMNLP Findings 2024] MobileQuant: Mobile-friendly Quantization for On-device Language Models☆54Updated 4 months ago
- CUDA Matrix Multiplication Optimization☆155Updated 6 months ago
- This repository contains the experimental PyTorch native float8 training UX☆219Updated 5 months ago
- 🚀 Collection of components for development, training, tuning, and inference of foundation models leveraging PyTorch native components.☆186Updated last week
- Cataloging released Triton kernels.☆157Updated 2 weeks ago
- ☆110Updated 3 weeks ago
- Convert tflite to JSON and make it editable in the IDE. It also converts the edited JSON back to tflite binary.☆27Updated last year
- Home for OctoML PyTorch Profiler☆107Updated last year
- ☆140Updated 11 months ago
- ☆100Updated 2 months ago
- Applied AI experiments and examples for PyTorch☆215Updated last week
- A curated list of resources for learning and exploring Triton, OpenAI's programming language for writing efficient GPU code.☆171Updated this week
- The Triton backend that allows running GPU-accelerated data pre-processing pipelines implemented in DALI's python API.☆132Updated 2 weeks ago
- ☆171Updated last week
- Collection of kernels written in Triton language☆91Updated 3 months ago
- Model compression for ONNX☆81Updated 2 months ago
- QUICK: Quantization-aware Interleaving and Conflict-free Kernel for efficient LLM inference☆114Updated 10 months ago