pujyam / simian
Simian Process Oriented Conservative JIT PDES from LANL
☆11Updated 3 years ago
Alternatives and similar repositories for simian:
Users that are interested in simian are comparing it to the libraries listed below
- A source-to-source compiler for optimizing CUDA dynamic parallelism by aggregating launches☆15Updated 5 years ago
- Streaming Message Interface: High-Performance Distributed Memory Programming on Reconfigurable Hardware☆16Updated 3 years ago
- Code released to accompany the ISCA paper: "T4: Compiling Sequential Code for Effective Speculative Parallelization in Hardware"☆28Updated 3 years ago
- A GPU FP32 computation method with Tensor Cores.☆20Updated 2 years ago
- A novel spatial accelerator for horizontal diffusion weather stencil computation, as described in ICS 2023 paper by Singh et al. (https:/…☆18Updated last year
- A unified programming framework for high and portable performance across FPGAs and GPUs☆11Updated last week
- ☆13Updated 10 years ago
- ☆21Updated last month
- Multiple 1-stencil implementations using nvidia cuda.☆13Updated 7 years ago
- ☆12Updated 3 years ago
- A memory profiler for NVIDIA GPUs to explore memory inefficiencies in GPU-accelerated applications.☆25Updated 5 months ago
- Mille Crepe Bench: layer-wise performance analysis for deep learning frameworks.☆17Updated 5 years ago
- HeteroSync is a benchmark suite for performing fine-grained synchronization on tightly coupled GPUs☆28Updated 6 months ago
- SpV8 is a SpMV kernel written in AVX-512. Artifact for our SpV8 paper @ DAC '21.☆29Updated 4 years ago
- ☆17Updated 2 years ago
- ☆30Updated 2 years ago
- GPU Performance Advisor☆64Updated 2 years ago
- CUDA Flux is a profiler for GPU applications which reports the basic block executions frequencies of compute kernels☆32Updated 4 years ago
- agile hardware-software co-design☆47Updated 3 years ago
- This adds partial support of AVX2 and AVX-512 to gem5.☆14Updated last year
- Multi-target compiler for Sum-Product Networks, based on MLIR and LLVM.☆23Updated 4 months ago
- ☆33Updated 2 years ago
- Performance Prediction Toolkit☆51Updated 3 months ago
- A retargetable and extensible synthesis-based compiler for modern hardware architectures☆10Updated this week
- A PIM instrumentation, compilation, execution, simulation, and evaluation repository for BLIMP-style architectures.☆18Updated 2 years ago
- Arrow Matrix Decomposition - Communication-Efficient Distributed Sparse Matrix Multiplication☆15Updated last year
- 📥 🎯 (1,4/4) an MLIR-based toolchain with Vitis HLS LLVM input/output targeting FPGAs.☆13Updated 2 years ago
- An HPL-AI implementation for Fugaku☆20Updated 3 years ago
- ColTraIn HBFP Training Emulator☆16Updated 2 years ago
- CPU micro benchmarks☆52Updated 2 weeks ago