BodhiHu / L-Mul
C implementation of the L-Mul f32/f16 multiplications from paper: https://arxiv.org/html/2410.00907
☆27Updated 7 months ago
Alternatives and similar repositories for L-Mul
Users that are interested in L-Mul are comparing it to the libraries listed below
Sorting:
- Can I make an *optimizing* compiler under 1k lines of code?☆56Updated 2 months ago
- C23 Checked Arithmetic☆128Updated 5 months ago
- Wyrm is a GCC GIMPLE to LLVM IR transpiler☆55Updated last year
- A fast implementation of log() and exp()☆53Updated 2 years ago
- A collection of some lockfree datastructures☆61Updated 2 years ago
- Rutgers APL correctly rounded math library☆29Updated 4 years ago
- Bytecode interpreter☆72Updated 3 months ago
- Bistra is a domain-specific language designed to generate high-performance kernels (such as GEMMs, convolutions, etc). The program is des…☆6Updated last year
- A very fast 64-bit PRNG with a 2^128 period, proven injectivity, passing BigCrush & PractRand (32TB).☆68Updated this week
- Modeling futexes in TLA+☆20Updated 7 months ago
- ☆31Updated 3 years ago
- A header-only portability and boilerplate library for C☆22Updated last year
- A tiny CPU rasterization engine accompanying a tutorial series on writing a CPU rasterizer☆89Updated 6 months ago
- A simple profiler to count Nvidia PTX assembly instructions of OpenCL/SYCL/CUDA kernels for roofline model analysis.☆50Updated last month
- A combined repository for all RLIBM prototypes☆45Updated 7 months ago
- A GLSL compiler targeting SPIR-V mlir☆20Updated 6 months ago
- A fast, zero dependency, single-header WebAssembly interpreter☆36Updated last year
- zip_vector in-memory compressed variable length integer array☆17Updated 2 years ago
- The little FFT library☆16Updated 9 months ago
- Tiny optimizing JIT compiler backend.☆46Updated 3 months ago
- A header-only C++ library for writing compiler/interpreter frontends.☆14Updated last month
- moderngpu algorithms for C++ shaders☆16Updated 4 years ago
- ☆296Updated last year
- A tagged-pointer type for C++.☆32Updated last year
- GPU hardware for Signed Distance Fields☆53Updated 11 months ago
- A rethinking of the C time library☆10Updated 2 months ago
- The code to accompany "Constant Time Stateless Shuffling and Grouping"☆45Updated last year
- Fast vectorized (SSE 4.1) range coder for 8-bit alphabets☆25Updated 2 years ago
- A header only library implementing common mathematical functions using SIMD intrinsics☆104Updated 2 months ago
- ☆18Updated 10 months ago