CoffeeBeforeArch / bits_of_architecture
Slides from the "Bits of Architecture" series on YouTube
☆19Updated 2 years ago
Related projects ⓘ
Alternatives and complementary repositories for bits_of_architecture
- "Hardware, Software, and Compilers! Oh My!" tutorial files☆17Updated 4 years ago
- Companion Repository for the Lecture Slides for the Clang Libraries☆88Updated 8 months ago
- Code examples for tutoring modern C++☆89Updated 4 months ago
- X86 CPU topics overview for developers , oriented towards performance☆193Updated last month
- ☆83Updated last year
- This is a mirror of the official libpfm4 git repository, https://sourceforge.net/p/perfmon2/libpfm4/ci/master/tree/ with some local branc…☆55Updated 3 weeks ago
- A small library and kernel module for easy access to x86 performance monitor counters under Linux.☆94Updated 6 months ago
- A profiler to disclose and quantify hardware features on GPUs.☆162Updated 2 years ago
- Very low-overhead timer/counter interfaces for C on Intel 64 processors.☆116Updated 5 years ago
- Slides and other materials from CppCon2021☆97Updated last year
- ☆124Updated last week
- Task graph-based asynchronous programming system using C++ coroutine☆84Updated 9 months ago
- A simple profiler to count Nvidia PTX assembly instructions of OpenCL/SYCL/CUDA kernels for roofline model analysis.☆43Updated 10 months ago
- C++20 Coroutines and io_uring☆47Updated last year
- L3: Lightweight Logging Library. A very small 'C' library to generate low-footprint, non-intrusive, high-performance logging of trace me…☆89Updated 2 months ago
- A collection of performance analysis tools, recipes, handy scripts, microbenchmarks & more☆117Updated this week
- C++20 Memory Allocators☆28Updated 2 months ago
- Code for paper "Engineering a High-Performance GPU B-Tree" accepted to PPoPP 2019☆52Updated 2 years ago
- Object Introspection (OI) enables on-demand, hierarchical profiling of objects in arbitrary C/C++ programs with no recompilation.☆165Updated 3 weeks ago
- A low-latency LRU approximation cache in C++ using CLOCK second-chance algorithm. Multi level cache too. Up to 2.5 billion lookups per se…☆66Updated 10 months ago
- Utilities for accessing AMD's Machine-Readable GPU ISA Specifications.☆21Updated 2 months ago
- Demonstration of various hardware effects on CUDA GPUs.☆359Updated last year
- GPU B-Tree with support for versioning (snapshots).☆44Updated 3 weeks ago
- C++ Custom memory allocators☆51Updated 4 years ago
- Serial and parallel implementations of matrix multiplication☆35Updated 3 years ago
- ROB size testing utility☆135Updated 2 years ago
- Learn LLVM 17, published by Packt☆141Updated 5 months ago
- HeteroSync is a benchmark suite for performing fine-grained synchronization on tightly coupled GPUs☆27Updated 2 months ago
- ☆44Updated 5 months ago
- AVX-512 documentation beyond what Intel provides☆42Updated 11 months ago