NVIDIA / tensorflow
An Open Source Machine Learning Framework for Everyone
☆969Updated last month
Related projects: ⓘ
- Optimized primitives for collective multi-GPU communication☆3,132Updated this week
- PyTorch/TorchScript/FX compiler for NVIDIA GPUs using TensorRT☆2,499Updated this week
- A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper and Ada GPUs…☆1,817Updated this week
- A GPU-accelerated library containing highly optimized building blocks and an execution engine for data processing to accelerate deep lear…☆5,067Updated this week
- NVIDIA® TensorRT™ is an SDK for high-performance deep learning inference on NVIDIA GPUs. This repository contains the open source compone…☆10,559Updated this week
- CUDA Library Samples☆1,519Updated last week
- oneAPI Deep Neural Network Library (oneDNN)☆3,579Updated this week
- Transformer related optimization, including BERT, GPT☆5,773Updated 5 months ago
- cudnn_frontend provides a c++ wrapper for the cudnn backend API and samples on how to use it☆422Updated 2 weeks ago
- CUDA Templates for Linear Algebra Subroutines☆5,359Updated this week
- CV-CUDA™ is an open-source, GPU accelerated library for cloud-scale image processing and computer vision.☆2,325Updated 2 weeks ago
- PyTorch extensions for high performance and large scale training.☆3,151Updated 3 weeks ago
- CUDA Python Low-level Bindings☆850Updated 2 weeks ago
- TensorFlow/TensorRT integration☆737Updated 9 months ago
- ONNX-TensorRT: TensorRT backend for ONNX☆2,912Updated last week
- C++ extensions in PyTorch☆988Updated last month
- SOTA low-bit LLM quantization (INT8/FP8/INT4/FP4/NF4) & sparsity; leading model compression techniques on TensorFlow, PyTorch, and ONNX R…☆2,152Updated this week
- Enabling PyTorch on XLA Devices (e.g. Google TPU)☆2,449Updated this week
- A Python package for extending the official PyTorch that can easily obtain performance on Intel platform☆1,554Updated this week
- [ARCHIVED] Cooperative primitives for CUDA C++. See https://github.com/NVIDIA/cccl☆1,669Updated 11 months ago
- FB (Facebook) + GEMM (General Matrix-Matrix Multiplication) - https://code.fb.com/ml-applications/fbgemm/☆1,169Updated this week
- NCCL Tests☆819Updated last month
- High-efficiency floating-point neural network inference operators for mobile, server, and Web☆1,812Updated this week
- A PyTorch Extension: Tools for easy mixed precision and distributed training in Pytorch☆8,312Updated 3 weeks ago
- DirectML is a high-performance, hardware-accelerated DirectX 12 library for machine learning. DirectML provides GPU acceleration for comm…☆2,172Updated last week
- A toolkit to optimize ML models for deployment for Keras and TensorFlow, including quantization and pruning.☆1,487Updated 2 months ago
- TorchBench is a collection of open source benchmarks used to evaluate PyTorch performance.☆846Updated this week
- A CPU+GPU Profiling library that provides access to timeline traces and hardware performance counters.☆692Updated last week
- CUDA Core Compute Libraries☆1,132Updated this week
- TensorFlow Estimator☆301Updated 7 months ago