nebuly-ai / exploring-AI-optimization
Curated list of awesome material on optimization techniques to make artificial intelligence faster and more efficient ๐
โ112Updated last year
Related projects โ
Alternatives and complementary repositories for exploring-AI-optimization
- ML/DL Math and Method notesโ57Updated 11 months ago
- experiments with inference on llamaโ105Updated 5 months ago
- NeurIPS Large Language Model Efficiency Challenge: 1 LLM + 1GPU + 1Dayโ252Updated last year
- ML model training for edge devicesโ157Updated last year
- Packages and instructions for training and inference of LLMs on NVIDIA's new GH200 machinesโ19Updated 2 months ago
- A high-throughput and memory-efficient inference and serving engine for LLMsโ253Updated last month
- vLLM: A high-throughput and memory-efficient inference and serving engine for LLMsโ89Updated this week
- A collection of all available inference solutions for the LLMsโ73Updated 2 months ago
- End-to-End LLM Guideโ97Updated 4 months ago
- Context Manager to profile the forward and backward times of PyTorch's nn.Moduleโ83Updated last year
- Cataloging released Triton kernels.โ138Updated 2 months ago
- ๐๏ธ A unified multi-backend utility for benchmarking Transformers, Timm, PEFT, Diffusers and Sentence-Transformers with full support of Oโฆโ260Updated this week
- A performant, memory-efficient checkpointing library for PyTorch applications, designed with large, complex distributed workloads in mindโฆโ146Updated this week
- ๐น๏ธ Performance Comparison of MLOps Engines, Frameworks, and Languages on Mainstream AI Models.โ134Updated 3 months ago
- Repository for the QUIK project, enabling the use of 4bit kernels for generative inference - EMNLP 2024โ173Updated 7 months ago
- MLCubeยฎ is a project that reduces friction for machine learning by ensuring that models are easily portable and reproducible.โ154Updated 2 months ago
- ๐ค Trade any tensors over the networkโ30Updated last year
- โ29Updated last year
- Fast Inference of MoE Models with CPU-GPU Orchestrationโ173Updated this week
- Outlining techniques for improving the training performance of your PyTorch model without compromising its accuracyโ124Updated last year
- ring-attention experimentsโ97Updated last month
- Code for the paper "QMoE: Practical Sub-1-Bit Compression of Trillion-Parameter Models".โ262Updated last year
- An open-source efficient deep learning framework/compiler, written in python.โ652Updated last week
- Blazing fast training of ๐ค Transformers on Graphcore IPUsโ82Updated 8 months ago
- A set of scripts and notebooks on LLM finetunning and dataset creationโ93Updated last month
- โ13Updated last year
- Manage scalable open LLM inference endpoints in Slurm clustersโ238Updated 4 months ago
- Drift detection module for machine learning pipelines.โ21Updated last year
- Easy and lightning fast training of ๐ค Transformers on Habana Gaudi processor (HPU)โ153Updated this week
- Evaluate and Enhance Your LLM Deployments for Real-World Inference Needsโ167Updated 2 weeks ago