drisspg / driss_torchLinks
Cuda extensions for PyTorch
☆11Updated 2 months ago
Alternatives and similar repositories for driss_torch
Users that are interested in driss_torch are comparing it to the libraries listed below
Sorting:
- ☆21Updated 3 months ago
- A user-friendly tool chain that enables the seamless execution of ONNX models using JAX as the backend.☆114Updated this week
- Small scale distributed training of sequential deep learning models, built on Numpy and MPI.☆134Updated last year
- Experiment of using Tangent to autodiff triton☆79Updated last year
- High-Performance SGEMM on CUDA devices☆96Updated 5 months ago
- extensible collectives library in triton☆86Updated 2 months ago
- LLM training in simple, raw C/CUDA☆99Updated last year
- No-GIL Python environment featuring NVIDIA Deep Learning libraries.☆61Updated 2 months ago
- ☆13Updated 3 months ago
- A stand-alone implementation of several NumPy dtype extensions used in machine learning.☆276Updated 3 weeks ago
- A Python-embedded DSL that makes it easy to write fast, scalable ML kernels with minimal boilerplate.☆170Updated this week
- ☆19Updated last month
- Explore training for quantized models☆18Updated last week
- JaxPP is a library for JAX that enables flexible MPMD pipeline parallelism for large-scale LLM training☆51Updated last month
- ☆28Updated 5 months ago
- Reference Kernels for the Leaderboard☆60Updated last week
- Solve puzzles. Learn CUDA.☆64Updated last year
- ☆12Updated last year
- Collection of kernels written in Triton language☆132Updated 2 months ago
- PyTorch Single Controller☆231Updated this week
- This repository contains the experimental PyTorch native float8 training UX☆224Updated 10 months ago
- jax-triton contains integrations between JAX and OpenAI Triton☆403Updated this week
- Write a fast kernel and run it on Discord. See how you compare against the best!☆46Updated this week
- An experimental CPU backend for Triton (https//github.com/openai/triton)☆43Updated 3 months ago
- ☆11Updated 2 months ago
- This is a port of Mistral-7B model in JAX☆32Updated 11 months ago
- ☆109Updated last year
- Parallel Computing starter project to build GPU & CPU kernels in CUDA & C++ and call them from Python without a single line of CMake usin…☆26Updated 3 months ago
- PTX-Tutorial Written Purely By AIs (Deep Research of Openai and Claude 3.7)☆66Updated 3 months ago
- [WIP] Better (FP8) attention for Hopper☆30Updated 4 months ago