google / gematriaLinks
Machine learning for machine code.
☆94Updated 2 months ago
Alternatives and similar repositories for gematria
Users that are interested in gematria are comparing it to the libraries listed below
Sorting:
- ☆85Updated this week
- ☆102Updated this week
- An experimental CPU backend for Triton (https//github.com/openai/triton)☆48Updated 4 months ago
- Custom PTX Instruction Benchmark☆137Updated 10 months ago
- MLIR-based partitioning system☆157Updated this week
- ☆83Updated last month
- TORCH_LOGS parser for PT2☆70Updated last week
- LLM training in simple, raw C/CUDA☆109Updated last year
- CUDA-L2: Surpassing cuBLAS Performance for Matrix Multiplication through Reinforcement Learning☆294Updated this week
- A GPU-driven system framework for scalable AI applications☆123Updated 11 months ago
- High-Performance SGEMM on CUDA devices☆115Updated 11 months ago
- ☆27Updated 9 months ago
- ctypes wrappers for HIP, CUDA, and OpenCL☆130Updated last year
- Unified compiler/runtime for interfacing with PyTorch Dynamo.☆104Updated 3 weeks ago
- The missing pieces (as far as boilerplate reduction goes) of the upstream MLIR python bindings.☆117Updated 2 months ago
- [DEPRECATED] Moved to ROCm/rocm-libraries repo☆113Updated this week
- Tenstorrent console based hardware information program☆58Updated this week
- Experiments and prototypes associated with IREE or MLIR☆56Updated last year
- Benchmarks to capture important workloads.☆31Updated 11 months ago
- Library to interface Compilers and ML models for ML-Enabled Compiler Optimizations☆20Updated 2 months ago
- TritonParse: A Compiler Tracer, Visualizer, and Reproducer for Triton Kernels☆182Updated this week
- Write a fast kernel and run it on Discord. See how you compare against the best!☆66Updated 3 weeks ago
- GPUOcelot: A dynamic compilation framework for PTX☆219Updated 11 months ago
- Source code for "BenchPress: A Deep Active Benchmark Generator", PACT 2022☆21Updated 2 years ago
- Open source cross-platform compiler for compute-intensive loops used in AI algorithms, from Microsoft Research☆115Updated 2 years ago
- Nvidia Instruction Set Specification Generator☆309Updated last year
- A lightweight memory allocator for hardware-accelerated machine learning☆179Updated 3 months ago
- Attention in SRAM on Tenstorrent Grayskull☆40Updated last year
- An interactive web-based tool for exploring intermediate representations of PyTorch and Triton models☆50Updated 2 weeks ago
- A Top-Down Profiler for GPU Applications☆22Updated last year