haifeng-jin / keras-benchmarksLinks
☆12Updated last year
Alternatives and similar repositories for keras-benchmarks
Users that are interested in keras-benchmarks are comparing it to the libraries listed below
Sorting:
- Cuda extensions for PyTorch☆12Updated last month
- ☆55Updated last year
- A stand-alone implementation of several NumPy dtype extensions used in machine learning.☆325Updated last week
- This is a port of Mistral-7B model in JAX☆32Updated last year
- High-Performance SGEMM on CUDA devices☆115Updated 11 months ago
- A user-friendly tool chain that enables the seamless execution of ONNX models using JAX as the backend.☆130Updated 3 weeks ago
- Benchmarks of different devices I have come across☆39Updated 4 months ago
- JAX-Toolbox☆377Updated this week
- LLM training in simple, raw C/CUDA☆110Updated last year
- jax-triton contains integrations between JAX and OpenAI Triton☆436Updated last month
- Neural Networks for JAX☆84Updated last year
- Additional multi-backend functionality for Keras 3.☆16Updated last year
- Tokamax: A GPU and TPU kernel library.☆158Updated this week
- Notes and artifacts from the ONNX steering committee☆28Updated 3 weeks ago
- Experiment of using Tangent to autodiff triton☆81Updated last year
- Where GPUs get cooked 👩🍳🔥☆347Updated 3 months ago
- A Fusion Code Generator for NVIDIA GPUs (commonly known as "nvFuser")☆371Updated this week
- A Jax-based library for building transformers, includes implementations of GPT, Gemma, LlaMa, Mixtral, Whisper, SWin, ViT and more.☆297Updated last year
- This repository hosts code that supports the testing infrastructure for the PyTorch organization. For example, this repo hosts the logic …☆105Updated this week
- Implementation of Flash Attention in Jax☆223Updated last year
- Collection of scripts to build PyTorch and the domain libraries from source.☆13Updated 2 months ago
- Write a fast kernel and run it on Discord. See how you compare against the best!☆66Updated last week
- Minimal yet performant LLM examples in pure JAX☆226Updated 2 weeks ago
- An experimental CPU backend for Triton (https//github.com/openai/triton)☆48Updated 4 months ago
- Small scale distributed training of sequential deep learning models, built on Numpy and MPI.☆154Updated 2 years ago
- ☆342Updated last week
- Hand-Rolled GPU communications library☆76Updated last month
- ☆21Updated 10 months ago
- ☆26Updated last year
- Learning about CUDA by writing PTX code.☆151Updated last year