sujunyan / tex-gallery
☆14Updated last year
Alternatives and similar repositories for tex-gallery:
Users that are interested in tex-gallery are comparing it to the libraries listed below
- [EuroSys'24] Minuet: Accelerating 3D Sparse Convolutions on GPUs☆74Updated 9 months ago
- Examples and instructions about use LLMs (especially ChatGPT) for PhD☆109Updated 2 years ago
- PyTorch compilation tutorial covering TorchScript, torch.fx, and Slapo☆18Updated 2 years ago
- ASPLOS'24: Optimal Kernel Orchestration for Tensor Programs with Korch☆32Updated 7 months ago
- Simple PyTorch profiler that combines DeepSpeed Flops Profiler and TorchInfo☆11Updated 2 years ago
- MobiSys#114☆21Updated last year
- ☆9Updated last year
- ☆9Updated 2 years ago
- ☆26Updated last year
- ☆12Updated 2 years ago
- tutorial for writing custom pytorch cpp+cuda kernel, applied on volume rendering (NeRF)☆26Updated last year
- An Attention Superoptimizer☆21Updated 2 months ago
- A telegram bot that sends you a message when the GPU is in use☆9Updated 10 months ago
- PKU LaTeX☆49Updated 2 weeks ago
- Code released to accompany the ISCA paper: "T4: Compiling Sequential Code for Effective Speculative Parallelization in Hardware"☆28Updated 3 years ago
- Official resporitory for "IPDPS' 24 QSync: Quantization-Minimized Synchronous Distributed Training Across Hybrid Devices".☆19Updated last year
- Chinese Translation for Philip Guo's The PhD Grind☆73Updated 2 years ago
- DISB is a new DNN inference serving benchmark with diverse workloads and models, as well as real-world traces.☆53Updated 7 months ago
- ngAP's artifact for ASPLOS'24☆21Updated 2 months ago
- An external memory allocator example for PyTorch.☆14Updated 3 years ago
- PyTorch-Based Fast and Efficient Processing for Various Machine Learning Applications with Diverse Sparsity☆107Updated last week
- A Sparse-tensor Communication Framework for Distributed Deep Learning☆13Updated 3 years ago
- Quantized Attention on GPU☆45Updated 4 months ago
- Arya: Arbitrary Graph Pattern Mining with Decomposition-based Sampling☆13Updated last year
- Repository for artifact evaluation of ASPLOS 2023 paper "SparseTIR: Composable Abstractions for Sparse Compilation in Deep Learning"☆24Updated 2 years ago
- 方便扩展的Cuda算子理解和优化框架,仅用在学习使用☆13Updated 9 months ago
- My paper/code reading notes in Chinese☆46Updated 10 months ago
- A source-to-source compiler for optimizing CUDA dynamic parallelism by aggregating launches☆15Updated 5 years ago
- A GPU-accelerated DNN inference serving system that supports instant kernel preemption and biased concurrent execution in GPU scheduling.☆42Updated 2 years ago
- SOTA Learning-augmented Systems☆35Updated 2 years ago