aime-team / pytorch-benchmarksLinks
A benchmark framework for Pytorch
☆26Updated 4 months ago
Alternatives and similar repositories for pytorch-benchmarks
Users that are interested in pytorch-benchmarks are comparing it to the libraries listed below
Sorting:
- Intel Gaudi's Megatron DeepSpeed Large Language Models for training☆13Updated 7 months ago
- vLLM: A high-throughput and memory-efficient inference and serving engine for LLMs☆88Updated this week
- A collection of reproducible inference engine benchmarks☆32Updated 3 months ago
- A parallel framework for training deep neural networks☆63Updated 4 months ago
- ☆74Updated 4 months ago
- Adaptive Parallel PDF Parsing and Resource Scaling Engine☆49Updated 2 months ago
- Make triton easier☆47Updated last year
- Example ML projects that use the Determined library.☆32Updated 10 months ago
- Repository of machine learning benchmarks☆39Updated this week
- ☆55Updated 2 months ago
- This code repository contains the code used for my "Optimizing Memory Usage for Training LLMs and Vision Transformers in PyTorch" blog po…☆92Updated 2 years ago
- llama.cpp to PyTorch Converter☆34Updated last year
- Unit Scaling demo and experimentation code☆16Updated last year
- Torch Distributed Experimental☆117Updated last year
- Supplementary material for our paper "Compute Trends Across Three Eras of Machine Learning".☆41Updated 3 years ago
- ☆47Updated this week
- Benchmarking PyTorch 2.0 different models☆20Updated 2 years ago
- MLPerf™ logging library☆36Updated last week
- Docker image NVIDIA GH200 machines - optimized for vllm serving and hf trainer finetuning☆47Updated 5 months ago
- A Python library transfers PyTorch tensors between CPU and NVMe☆118Updated 8 months ago
- Machine Learning Agility (MLAgility) benchmark and benchmarking tools☆39Updated last week
- Train, tune, and infer Bamba model☆131Updated 2 months ago
- A safetensors extension to efficiently store sparse quantized tensors on disk☆142Updated this week
- Cray-LM unified training and inference stack.☆22Updated 6 months ago
- Write a fast kernel and run it on Discord. See how you compare against the best!☆50Updated this week
- train with kittens!☆62Updated 9 months ago
- High-Performance Linpack Benchmark adopted version for GPU backend☆11Updated 2 years ago
- Linear Attention Sequence Parallelism (LASP)☆85Updated last year
- Example of applying CUDA graphs to LLaMA-v2☆12Updated last year
- Benchmarks to capture important workloads.☆31Updated 6 months ago