Kedreamix / pytorch-cppcuda-tutorial
tutorial for writing custom pytorch cpp+cuda kernel, applied on volume rendering (NeRF)
☆19Updated 9 months ago
Related projects: ⓘ
- A minimalist and extensible PyTorch extension for implementing custom backend operators in PyTorch.☆25Updated 5 months ago
- 使用 cutlass 仓库在 ada 架构上实现 fp8 的 flash attention☆46Updated last month
- flash attention tutorial written in python, triton, cuda, cutlass☆159Updated 3 months ago
- [ICLR 2024] This is the official PyTorch implementation of "QLLM: Accurate and Efficient Low-Bitwidth Quantization for Large Language Mod…☆33Updated 6 months ago
- [ICML 2023] This project is the official implementation of our accepted ICML 2023 paper BiBench: Benchmarking and Analyzing Network Binar…☆54Updated 6 months ago
- A list of papers, docs, codes about efficient AIGC. This repo is aimed to provide the info for efficient AIGC research, including languag…☆142Updated 4 months ago
- TiledCUDA is a highly efficient kernel template library designed to elevate CUDA C’s level of abstraction for processing tiles.☆114Updated last week
- A parallelism VAE avoids OOM for high resolution image generation☆34Updated 2 months ago
- List of papers related to Vision Transformers quantization and hardware acceleration in recent AI conferences and journals.☆47Updated 3 months ago
- ☆59Updated 2 months ago
- [ACL 2024] A novel QAT with Self-Distillation framework to enhance ultra low-bit LLMs.☆69Updated 4 months ago
- Tutorials for writing high-performance GPU operators in AI frameworks.☆118Updated last year
- Patch convolution to avoid large GPU memory usage of Conv2D☆73Updated 3 months ago
- The official implementation of the NeurIPS 2022 paper Q-ViT.☆77Updated last year
- [DAC 2024] EDGE-LLM: Enabling Efficient Large Language Model Adaptation on Edge Devices via Layerwise Unified Compression and Adaptive La…☆22Updated 2 months ago
- [ICML 2024 Oral] This project is the official implementation of our Accurate LoRA-Finetuning Quantization of LLMs via Information Retenti…☆55Updated 5 months ago
- ☆74Updated last week
- Awesome LLM pruning papers all-in-one repository with integrating all useful resources and insights.☆31Updated last month
- Official implementation of the EMNLP23 paper: Outlier Suppression+: Accurate quantization of large language models by equivalent and opti…☆38Updated 11 months ago
- The official PyTorch implementation of the NeurIPS2022 (spotlight) paper, Outlier Suppression: Pushing the Limit of Low-bit Transformer L…☆46Updated last year
- 📖A small curated list of Awesome SD/DiT/ViT/Diffusion Inference with Distributed/Caching/Sampling: DistriFusion, PipeFusion, AsyncDiff, …☆64Updated 2 weeks ago
- PyTorch code for Q-DiT: Accurate Post-Training Quantization for Diffusion Transformers☆27Updated 2 weeks ago
- ☆134Updated last year
- Code Repository of Evaluating Quantized Large Language Models☆89Updated last week
- learning how CUDA works☆150Updated last month
- hands on model tuning with TVM and profile it on a Mac M1, x86 CPU, and GTX-1080 GPU.☆36Updated last year
- A standalone GEMM kernel for fp16 activation and quantized weight, extracted from FasterTransformer☆82Updated 6 months ago
- ☆60Updated last month
- [CVPR 2024 Highlight] This is the official PyTorch implementation of "TFMQ-DM: Temporal Feature Maintenance Quantization for Diffusion Mo…☆53Updated last month
- ☆90Updated 6 months ago