nebuly-ai / exploring-AI-optimization
Curated list of awesome material on optimization techniques to make artificial intelligence faster and more efficient π
β111Updated 11 months ago
Related projects: β
- β234Updated last month
- ML/DL Math and Method notesβ56Updated 9 months ago
- NeurIPS Large Language Model Efficiency Challenge: 1 LLM + 1GPU + 1Dayβ248Updated 10 months ago
- End-to-End LLM Guideβ91Updated 2 months ago
- A performant, memory-efficient checkpointing library for PyTorch applications, designed with large, complex distributed workloads in mindβ¦β145Updated this week
- This code repository contains the code used for my "Optimizing Memory Usage for Training LLMs and Vision Transformers in PyTorch" blog poβ¦β84Updated last year
- Google TPU optimizations for transformers modelsβ62Updated this week
- ποΈ A unified multi-backend utility for benchmarking Transformers, Timm, PEFT, Diffusers and Sentence-Transformers with full support of Oβ¦β231Updated last week
- Mixed precision training from scratch with Tensors and CUDAβ18Updated 4 months ago
- TorchX is a universal job launcher for PyTorch applications. TorchX is designed to have fast iteration time for training/research and supβ¦β324Updated last week
- experiments with inference on llamaβ106Updated 3 months ago
- π Collection of components for development, training, tuning, and inference of foundation models leveraging PyTorch native components.β144Updated this week
- β97Updated this week
- A high-throughput and memory-efficient inference and serving engine for LLMsβ250Updated this week
- A set of scripts and notebooks on LLM finetunning and dataset creationβ89Updated last week
- vLLM: A high-throughput and memory-efficient inference and serving engine for LLMsβ87Updated this week
- πΉοΈ Performance Comparison of MLOps Engines, Frameworks, and Languages on Mainstream AI Models.β129Updated last month
- Easy and lightning fast training of π€ Transformers on Habana Gaudi processor (HPU)β144Updated this week
- Context Manager to profile the forward and backward times of PyTorch's nn.Moduleβ83Updated 11 months ago
- This repository contains the experimental PyTorch native float8 training UXβ210Updated last month
- A scalable & efficient active learning/data selection system for everyone.β212Updated 2 months ago
- A library to analyze PyTorch traces.β270Updated 2 weeks ago
- Functional local implementations of main model parallelism approachesβ93Updated last year
- Large scale 4D parallelism pre-training for π€ transformers in Mixture of Experts *(still work in progress)*β77Updated 9 months ago
- Torch Distributed Experimentalβ115Updated last month
- Codes for paper "KNAS: Green Neural Architecture Search"β92Updated 2 years ago
- Blazing fast training of π€ Transformers on Graphcore IPUsβ81Updated 6 months ago
- β124Updated 7 months ago
- Implementation of a Transformer, but completely in Tritonβ242Updated 2 years ago
- Simple and fast low-bit matmul kernels in CUDAβ48Updated this week