nebuly-ai / exploring-AI-optimization
Curated list of awesome material on optimization techniques to make artificial intelligence faster and more efficient π
β113Updated last year
Alternatives and similar repositories for exploring-AI-optimization:
Users that are interested in exploring-AI-optimization are comparing it to the libraries listed below
- β29Updated last year
- vLLM: A high-throughput and memory-efficient inference and serving engine for LLMsβ87Updated this week
- ML/DL Math and Method notesβ59Updated last year
- Functional local implementations of main model parallelism approachesβ95Updated 2 years ago
- Google TPU optimizations for transformers modelsβ104Updated 2 months ago
- ArcticTraining is a framework designed to simplify and accelerate the post-training process for large language models (LLMs)β51Updated this week
- Context Manager to profile the forward and backward times of PyTorch's nn.Moduleβ84Updated last year
- TitanML Takeoff Server is an optimization, compression and deployment platform that makes state of the art machine learning models accessβ¦β114Updated last year
- Easily benchmark PyTorch model FLOPs, latency, throughput, allocated gpu memory and energy consumptionβ98Updated last year
- experiments with inference on llamaβ104Updated 9 months ago
- CUDA and Triton implementations of Flash Attention with SoftmaxN.β68Updated 9 months ago
- πΉοΈ Performance Comparison of MLOps Engines, Frameworks, and Languages on Mainstream AI Models.β136Updated 7 months ago
- Large scale 4D parallelism pre-training for π€ transformers in Mixture of Experts *(still work in progress)*β81Updated last year
- ML model training for edge devicesβ162Updated last year
- π€ Trade any tensors over the networkβ30Updated last year
- Blazing fast training of π€ Transformers on Graphcore IPUsβ84Updated last year
- End-to-End LLM Guideβ104Updated 8 months ago
- Repository for the QUIK project, enabling the use of 4bit kernels for generative inference - EMNLP 2024β177Updated 11 months ago
- π Interactive performance profiling and debugging tool for PyTorch neural networks.β59Updated 2 months ago
- Collection of kernels written in Triton languageβ114Updated last month
- NeurIPS Large Language Model Efficiency Challenge: 1 LLM + 1GPU + 1Dayβ255Updated last year
- Small scale distributed training of sequential deep learning models, built on Numpy and MPI.β127Updated last year
- π Collection of components for development, training, tuning, and inference of foundation models leveraging PyTorch native components.β189Updated this week
- β248Updated 8 months ago
- PyTorch/XLA integration with JetStream (https://github.com/google/JetStream) for LLM inference"β54Updated last month
- Make triton easierβ47Updated 9 months ago
- Home for OctoML PyTorch Profilerβ108Updated last year
- Documented and Unit Tested educational Deep Learning framework with Autograd from scratch.β111Updated 11 months ago
- This code repository contains the code used for my "Optimizing Memory Usage for Training LLMs and Vision Transformers in PyTorch" blog poβ¦β87Updated last year
- PDFs and Codelabs for the Efficient Deep Learning book.β191Updated last year