haifeng-jin / keras-benchmarksLinks
☆12Updated last year
Alternatives and similar repositories for keras-benchmarks
Users that are interested in keras-benchmarks are comparing it to the libraries listed below
Sorting:
- This is a port of Mistral-7B model in JAX☆32Updated last year
- ☆21Updated 8 months ago
- Cuda extensions for PyTorch☆11Updated 7 months ago
- Optimized primitives for collective multi-GPU communication☆10Updated last year
- High-Performance SGEMM on CUDA devices☆112Updated 10 months ago
- Collection of scripts to build PyTorch and the domain libraries from source.☆12Updated 2 weeks ago
- JAX-Toolbox☆363Updated this week
- This repository hosts code that supports the testing infrastructure for the PyTorch organization. For example, this repo hosts the logic …☆103Updated this week
- TorchFix - a linter for PyTorch-using code with autofix support☆151Updated 3 months ago
- LLM training in simple, raw C/CUDA☆108Updated last year
- ☆53Updated last year
- A user-friendly tool chain that enables the seamless execution of ONNX models using JAX as the backend.☆125Updated 2 months ago
- ☆51Updated last week
- A stand-alone implementation of several NumPy dtype extensions used in machine learning.☆308Updated this week
- Experiment of using Tangent to autodiff triton☆80Updated last year
- A Fusion Code Generator for NVIDIA GPUs (commonly known as "nvFuser")☆362Updated this week
- An experimental CPU backend for Triton (https//github.com/openai/triton)☆47Updated 3 months ago
- ☆28Updated 4 months ago
- Material for the SC22 Deep Learning at Scale Tutorial☆41Updated 2 years ago
- Hand-Rolled GPU communications library☆65Updated this week
- Effective transpose on Hopper GPU☆26Updated 2 months ago
- Write a fast kernel and run it on Discord. See how you compare against the best!☆61Updated last week
- Where GPUs get cooked 👩🍳🔥☆317Updated 2 months ago
- ☆18Updated last week
- jax-triton contains integrations between JAX and OpenAI Triton☆435Updated this week
- A FlashAttention implementation for JAX with support for efficient document mask computation and context parallelism.☆149Updated last week
- No-GIL Python environment featuring NVIDIA Deep Learning libraries.☆68Updated 7 months ago
- MLPerf™ logging library☆37Updated last month
- Parallel framework for training and fine-tuning deep neural networks☆68Updated 2 weeks ago
- Memory Optimizations for Deep Learning (ICML 2023)☆110Updated last year