seth-lu / Im2win
☆11Updated last year
Related projects: ⓘ
- MLPerf™ Mobile models☆24Updated last month
- A faster implementation of OpenCV-CUDA that uses OpenCV objects, and more!☆34Updated this week
- Python Interface to HIP and hiprtc Library☆9Updated 10 months ago
- Official PyTorch implementation of "LayerMerge: Neural Network Depth Compression through Layer Pruning and Merging" (ICML'24)☆27Updated last month
- Dynamic Neural Architecture Search Toolkit☆28Updated 3 months ago
- Loop Nest - Linear algebra compiler and code generator.☆22Updated last year
- APPy (Annotated Parallelism for Python) enables users to annotate loops and tensor expressions in Python with compiler directives akin to…☆20Updated 5 months ago
- Code repository for the public reproduction of the language modelling experiments on "MatFormer: Nested Transformer for Elastic Inference…☆17Updated 10 months ago
- Adaptive neighbor sampling for temporal GNN☆10Updated 7 months ago
- benchmarking some transformer deployments☆26Updated last year
- NASRec Weight Sharing Neural Architecture Search for Recommender Systems☆26Updated 9 months ago
- FP64 equivalent GEMM via Int8 Tensor Cores using the Ozaki scheme☆44Updated 2 weeks ago
- Benchmarks to capture important workloads.☆28Updated 3 months ago
- ☆38Updated 9 months ago
- Learning Compiler Pass Orders using Coreset and Normalized Value Prediction. (ICML 2023)☆17Updated last year
- Open deep learning compiler stack for cpu, gpu and specialized accelerators☆33Updated last year
- Drastically Reducing the Number of Trainable Parameters in Deep CNNs by Inter-layer Kernel-sharing☆12Updated last year
- Experimental scripts for researching data adaptive learning rate scheduling.☆23Updated 11 months ago
- [ICML 2022] "Coarsening the Granularity: Towards Structurally Sparse Lottery Tickets" by Tianlong Chen, Xuxi Chen, Xiaolong Ma, Yanzhi Wa…☆30Updated last year
- Tutorials to GPU programming. Reading notes.☆10Updated last year
- Implementation of IceFormer: Accelerated Inference with Long-Sequence Transformers on CPUs (ICLR 2024).☆18Updated 3 months ago
- TAO Toolkit deep learning networks with TensorFlow 1.x backend☆10Updated 7 months ago
- Memory Optimizations for Deep Learning (ICML 2023)☆58Updated 6 months ago
- ☆18Updated last year
- ☆16Updated this week
- A tracing JIT compiler for PyTorch☆12Updated 2 years ago
- Simple and fast low-bit matmul kernels in CUDA☆48Updated this week
- SynapseAI Core is a reference implementation of the SynapseAI API running on Habana Gaudi☆36Updated last year
- This library empowers users to seamlessly port pretrained models and checkpoints on the HuggingFace (HF) hub (developed using HF transfor…☆37Updated this week
- ☆21Updated this week